Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'drm-xe-next-2026-03-02' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

UAPI Changes:
- restrict multi-lrc to VCS/VECS engines (Xin Wang)
- Introduce a flag to disallow vm overcommit in fault mode (Thomas)
- update used tracking kernel-doc (Auld, Fixes)
- Some bind queue fixes (Auld, Fixes)

Cross-subsystem Changes:
- Split drm_suballoc_new() into SA alloc and init helpers (Satya, Fixes)
- pass pagemap_addr by reference (Arnd, Fixes)
- Revert "drm/pagemap: Disable device-to-device migration" (Thomas)
- Fix unbalanced unlock in drm_gpusvm_scan_mm (Maciej, Fixes)
- Small GPUSVM fixes (Brost, Fixes)
- Fix xe SVM configs (Thomas, Fixes)

Core Changes:
- Fix a hmm_range_fault() livelock / starvation problem (Thomas, Fixes)

Driver Changes:
- Fix leak on xa_store failure (Shuicheng, Fixes)
- Correct implementation of Wa_16025250150 (Roper, Fixes)
- Refactor context init into xe_lrc_ctx_init (Raag)
- Fix GSC proxy cleanup on early initialization failure (Zhanjun)
- Fix exec queue creation during post-migration recovery (Tomasz, Fixes)
- Apply windower hardware filtering setting on Xe3 and Xe3p (Roper)
- Free ctx_restore_mid_bb in release (Shuicheng, Fixes)
- Drop stale MCR steering TODO comment (Roper)
- dGPU memory optimizations (Brost)
- Do not preempt fence signaling CS instructions (Brost, Fixes)
- Revert "drm/xe/compat: Remove unused i915_reg.h from compat header" (Uma)
- Don't expose display modparam if no display support (Wajdeczko)
- Some VRAM flag improvements (Wajdeczko)
- Misc fix for xe_guc_ct.c (Shuicheng, Fixes)
- Remove unused i915_reg.h from compat header (Uma)
- Workaround cleanup & simplification (Roper)
- Add prefetch pagefault support for Xe3p (Varun)
- Fix fs_reclaim deadlock caused by CCS save/restore (Satya, Fixes)
- Cleanup partially initialized sync on parse failure (Shuicheng, Fixes)
- Allow to change VFs VRAM quota using sysfs (Michal)
- Increase GuC log sizes in debug builds (Tomasz)
- Wa_18041344222 changes (Harish)
- Add Wa_14026781792 (Niton)
- Add debugfs facility to catch RTP mistakes (Roper)
- Convert GT stats to per-cpu counters (Brost)
- Prevent unintended VRAM channel creation (Karthik)
- Privatize struct xe_ggtt (Maarten)
- remove unnecessary struct dram_info forward declaration (Jani)
- pagefault refactors (Brost)
- Apply Wa_14024997852 (Arvind)
- Redirect faults to dummy page for wedged device (Raag, Fixes)
- Force EXEC_QUEUE_FLAG_KERNEL for kernel internal VMs (Piotr)
- Stop applying Wa_16018737384 from Xe3 onward (Roper)
- Add new XeCore fuse registers to VF runtime regs (Roper)
- Update xe_device_declare_wedged() error log (Raag)
- Make xe_modparam.force_vram_bar_size signed (Shuicheng, Fixes)
- Avoid reading media version when media GT is disabled (Piotr, Fixes)
- Fix handling of Wa_14019988906 & Wa_14019877138 (Roper, Fixes)
- Basic enabling patches for Xe3p_LPG and NVL-P (Gustavo, Roper, Shekhar)
- Avoid double-adjust in 64-bit reads (Shuicheng, Fixes)
- Allow VF to initialize MCR tables (Wajdeczko)
- Add Wa_14025883347 for GuC DMA failure on reset (Anirban)
- Add bounds check on pat_index to prevent OOB kernel read in madvise (Jia, Fixes)
- Fix the address range assert in ggtt_get_pte helper (Winiarski)
- XeCore fuse register changes (Roper)
- Add more info to powergate_info debugfs (Vinay)
- Separate out GuC RC code (Vinay)
- Fix g2g_test_array indexing (Pallavi)
- Mutual exclusivity between CCS-mode and PF (Nareshkumar, Fixes)
- Some more _types.h cleanups (Wajdeczko)
- Fix sysfs initialization (Wajdeczko, Fixes)
- Drop unnecessary goto in xe_device_create (Roper)
- Disable D3Cold for BMG only on specific platforms (Karthik, Fixes)
- Add sriov.admin_only_pf attribute (Wajdeczko)
- replace old wq(s), add WQ_PERCPU to alloc_workqueue (Marco)
- Make MMIO communication more robust (Wajdeczko)
- Fix warning of kerneldoc (Shuicheng, Fixes)
- Fix topology query pointer advance (Shuicheng, Fixes)
- use entry_dump callbacks for xe2+ PAT dumps (Xin Wang)
- Fix kernel-doc warning in GuC scheduler ABI header (Chaitanya, Fixes)
- Fix CFI violation in debugfs access (Daniele, Fixes)
- Apply WA_16028005424 to Media (Balasubramani)
- Fix typo in function kernel-doc (Wajdeczko)
- Protect priority against concurrent access (Niranjana)
- Fix nvm aux resource cleanup (Shuicheng, Fixes)
- Fix is_bound() pci_dev lifetime (Shuicheng, Fixes)
- Use CLASS() for forcewake in xe_gt_enable_comp_1wcoh (Shuicheng)
- Reset VF GuC state on fini (Wajdeczko)
- Move _THIS_IP_ usage from xe_vm_create() to dedicated function (Nathan Chancellor, Fixes)
- Unregister drm device on probe error (Shuicheng, Fixes)
- Disable DCC on PTL (Vinay, Fixes)
- Fix Wa_18022495364 (Tvrtko, Fixes)
- Skip address copy for sync-only execs (Shuicheng, Fixes)
- derive mem copy capability from graphics version (Nitin, Fixes)
- Use DRM_BUDDY_CONTIGUOUS_ALLOCATION for contiguous allocations (Sanjay)
- Context based TLB invalidations (Brost)
- Enable multi_queue on xe3p_xpc (Brost, Niranjana)
- Remove check for gt in xe_query (Nakshtra)
- Reduce LRC timestamp stuck message on VFs to notice (Brost, Fixes)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/aaYR5G2MHjOEMXPW@lstrano-desk.jf.intel.com

+3584 -1862
+31
Documentation/ABI/testing/sysfs-driver-intel-xe-sriov
··· 129 129 -EIO if FW refuses to change the provisioning. 130 130 131 131 132 + What: /sys/bus/pci/drivers/xe/.../sriov_admin/.bulk_profile/vram_quota 133 + What: /sys/bus/pci/drivers/xe/.../sriov_admin/vf<n>/profile/vram_quota 134 + Date: February 2026 135 + KernelVersion: 7.0 136 + Contact: intel-xe@lists.freedesktop.org 137 + Description: 138 + These files allow to perform initial VFs VRAM provisioning prior to VFs 139 + enabling or to change VFs VRAM provisioning once the VFs are enabled. 140 + Any non-zero initial VRAM provisioning will block VFs auto-provisioning. 141 + Without initial VRAM provisioning those files will show result of the 142 + VRAM auto-provisioning performed by the PF once the VFs are enabled. 143 + Once the VFs are disabled, all VRAM provisioning will be released. 144 + These files are visible only on discrete Intel Xe platforms with VRAM 145 + and are writeable only if dynamic VFs VRAM provisioning is supported. 146 + 147 + .bulk_profile/vram_quota: (WO) unsigned integer 148 + The amount of the provisioned VRAM in [bytes] for each VF. 149 + Actual quota value might be aligned per HW/FW requirements. 150 + 151 + profile/vram_quota: (RW) unsigned integer 152 + The amount of the provisioned VRAM in [bytes] for this VF. 153 + Actual quota value might be aligned per HW/FW requirements. 154 + 155 + Default is 0 (unprovisioned). 156 + 157 + Writes to these attributes may fail with errors like: 158 + -EINVAL if provided input is malformed or not recognized, 159 + -EPERM if change is not applicable on given HW/FW, 160 + -EIO if FW refuses to change the provisioning. 161 + 162 + 132 163 What: /sys/bus/pci/drivers/xe/.../sriov_admin/vf<n>/stop 133 164 Date: October 2025 134 165 KernelVersion: 6.19
+3
Documentation/gpu/xe/xe_firmware.rst
··· 31 31 .. kernel-doc:: drivers/gpu/drm/xe/xe_guc_pc.c 32 32 :doc: GuC Power Conservation (PC) 33 33 34 + .. kernel-doc:: drivers/gpu/drm/xe/xe_guc_rc.c 35 + :doc: GuC Render C-states (GuC RC) 36 + 34 37 PCIe Gen5 Limitations 35 38 ===================== 36 39
+4 -3
drivers/gpu/drm/drm_gpusvm.c
··· 819 819 820 820 if (!(pfns[i] & HMM_PFN_VALID)) { 821 821 state = DRM_GPUSVM_SCAN_UNPOPULATED; 822 - goto err_free; 822 + break; 823 823 } 824 824 825 825 page = hmm_pfn_to_page(pfns[i]); ··· 856 856 i += 1ul << drm_gpusvm_hmm_pfn_to_order(pfns[i], i, npages); 857 857 } 858 858 859 - err_free: 860 859 drm_gpusvm_notifier_unlock(range->gpusvm); 861 860 861 + err_free: 862 862 kvfree(pfns); 863 863 return state; 864 864 } ··· 1495 1495 } 1496 1496 zdd = page->zone_device_data; 1497 1497 if (pagemap != page_pgmap(page)) { 1498 - if (i > 0) { 1498 + if (pagemap) { 1499 1499 err = -EOPNOTSUPP; 1500 1500 goto err_unmap; 1501 1501 } ··· 1572 1572 return 0; 1573 1573 1574 1574 err_unmap: 1575 + svm_pages->flags.has_dma_mapping = true; 1575 1576 __drm_gpusvm_unmap_pages(gpusvm, svm_pages, num_dma_mapped); 1576 1577 drm_gpusvm_notifier_unlock(gpusvm); 1577 1578 err_free:
+2 -12
drivers/gpu/drm/drm_pagemap.c
··· 480 480 .start = start, 481 481 .end = end, 482 482 .pgmap_owner = pagemap->owner, 483 - /* 484 - * FIXME: MIGRATE_VMA_SELECT_DEVICE_PRIVATE intermittently 485 - * causes 'xe_exec_system_allocator --r *race*no*' to trigger aa 486 - * engine reset and a hard hang due to getting stuck on a folio 487 - * lock. This should work and needs to be root-caused. The only 488 - * downside of not selecting MIGRATE_VMA_SELECT_DEVICE_PRIVATE 489 - * is that device-to-device migrations won’t work; instead, 490 - * memory will bounce through system memory. This path should be 491 - * rare and only occur when the madvise attributes of memory are 492 - * changed or atomics are being used. 493 - */ 494 - .flags = MIGRATE_VMA_SELECT_SYSTEM | MIGRATE_VMA_SELECT_DEVICE_COHERENT, 483 + .flags = MIGRATE_VMA_SELECT_SYSTEM | MIGRATE_VMA_SELECT_DEVICE_COHERENT | 484 + MIGRATE_VMA_SELECT_DEVICE_PRIVATE, 495 485 }; 496 486 unsigned long i, npages = npages_in_range(start, end); 497 487 unsigned long own_pages = 0, migrated_pages = 0;
+86 -20
drivers/gpu/drm/drm_suballoc.c
··· 293 293 } 294 294 295 295 /** 296 - * drm_suballoc_new() - Make a suballocation. 296 + * drm_suballoc_alloc() - Allocate uninitialized suballoc object. 297 + * @gfp: gfp flags used for memory allocation. 298 + * 299 + * Allocate memory for an uninitialized suballoc object. Intended usage is 300 + * allocate memory for suballoc object outside of a reclaim tainted context 301 + * and then be initialized at a later time in a reclaim tainted context. 302 + * 303 + * @drm_suballoc_free() should be used to release the memory if returned 304 + * suballoc object is in uninitialized state. 305 + * 306 + * Return: a new uninitialized suballoc object, or an ERR_PTR(-ENOMEM). 307 + */ 308 + struct drm_suballoc *drm_suballoc_alloc(gfp_t gfp) 309 + { 310 + struct drm_suballoc *sa; 311 + 312 + sa = kmalloc_obj(*sa, gfp); 313 + if (!sa) 314 + return ERR_PTR(-ENOMEM); 315 + 316 + sa->manager = NULL; 317 + 318 + return sa; 319 + } 320 + EXPORT_SYMBOL(drm_suballoc_alloc); 321 + 322 + /** 323 + * drm_suballoc_insert() - Initialize a suballocation and insert a hole. 297 324 * @sa_manager: pointer to the sa_manager 325 + * @sa: The struct drm_suballoc. 298 326 * @size: number of bytes we want to suballocate. 299 - * @gfp: gfp flags used for memory allocation. Typically GFP_KERNEL but 300 - * the argument is provided for suballocations from reclaim context or 301 - * where the caller wants to avoid pipelining rather than wait for 302 - * reclaim. 303 327 * @intr: Whether to perform waits interruptible. This should typically 304 328 * always be true, unless the caller needs to propagate a 305 329 * non-interruptible context from above layers. 306 330 * @align: Alignment. Must not exceed the default manager alignment. 307 331 * If @align is zero, then the manager alignment is used. 308 332 * 309 - * Try to make a suballocation of size @size, which will be rounded 310 - * up to the alignment specified in specified in drm_suballoc_manager_init(). 333 + * Try to make a suballocation on a pre-allocated suballoc object of size @size, 334 + * which will be rounded up to the alignment specified in specified in 335 + * drm_suballoc_manager_init(). 311 336 * 312 - * Return: a new suballocated bo, or an ERR_PTR. 337 + * Return: zero on success, errno on failure. 313 338 */ 314 - struct drm_suballoc * 315 - drm_suballoc_new(struct drm_suballoc_manager *sa_manager, size_t size, 316 - gfp_t gfp, bool intr, size_t align) 339 + int drm_suballoc_insert(struct drm_suballoc_manager *sa_manager, 340 + struct drm_suballoc *sa, size_t size, 341 + bool intr, size_t align) 317 342 { 318 343 struct dma_fence *fences[DRM_SUBALLOC_MAX_QUEUES]; 319 344 unsigned int tries[DRM_SUBALLOC_MAX_QUEUES]; 320 345 unsigned int count; 321 346 int i, r; 322 - struct drm_suballoc *sa; 323 347 324 348 if (WARN_ON_ONCE(align > sa_manager->align)) 325 - return ERR_PTR(-EINVAL); 349 + return -EINVAL; 326 350 if (WARN_ON_ONCE(size > sa_manager->size || !size)) 327 - return ERR_PTR(-EINVAL); 351 + return -EINVAL; 328 352 329 353 if (!align) 330 354 align = sa_manager->align; 331 355 332 - sa = kmalloc_obj(*sa, gfp); 333 - if (!sa) 334 - return ERR_PTR(-ENOMEM); 335 356 sa->manager = sa_manager; 336 357 sa->fence = NULL; 337 358 INIT_LIST_HEAD(&sa->olist); ··· 369 348 if (drm_suballoc_try_alloc(sa_manager, sa, 370 349 size, align)) { 371 350 spin_unlock(&sa_manager->wq.lock); 372 - return sa; 351 + return 0; 373 352 } 374 353 375 354 /* see if we can skip over some allocations */ ··· 406 385 } while (!r); 407 386 408 387 spin_unlock(&sa_manager->wq.lock); 409 - kfree(sa); 410 - return ERR_PTR(r); 388 + sa->manager = NULL; 389 + return r; 390 + } 391 + EXPORT_SYMBOL(drm_suballoc_insert); 392 + 393 + /** 394 + * drm_suballoc_new() - Make a suballocation. 395 + * @sa_manager: pointer to the sa_manager 396 + * @size: number of bytes we want to suballocate. 397 + * @gfp: gfp flags used for memory allocation. Typically GFP_KERNEL but 398 + * the argument is provided for suballocations from reclaim context or 399 + * where the caller wants to avoid pipelining rather than wait for 400 + * reclaim. 401 + * @intr: Whether to perform waits interruptible. This should typically 402 + * always be true, unless the caller needs to propagate a 403 + * non-interruptible context from above layers. 404 + * @align: Alignment. Must not exceed the default manager alignment. 405 + * If @align is zero, then the manager alignment is used. 406 + * 407 + * Try to make a suballocation of size @size, which will be rounded 408 + * up to the alignment specified in specified in drm_suballoc_manager_init(). 409 + * 410 + * Return: a new suballocated bo, or an ERR_PTR. 411 + */ 412 + struct drm_suballoc * 413 + drm_suballoc_new(struct drm_suballoc_manager *sa_manager, size_t size, 414 + gfp_t gfp, bool intr, size_t align) 415 + { 416 + struct drm_suballoc *sa; 417 + int err; 418 + 419 + sa = drm_suballoc_alloc(gfp); 420 + if (IS_ERR(sa)) 421 + return sa; 422 + 423 + err = drm_suballoc_insert(sa_manager, sa, size, intr, align); 424 + if (err) { 425 + drm_suballoc_free(sa, NULL); 426 + return ERR_PTR(err); 427 + } 428 + 429 + return sa; 411 430 } 412 431 EXPORT_SYMBOL(drm_suballoc_new); 413 432 ··· 465 404 466 405 if (!suballoc) 467 406 return; 407 + 408 + if (!suballoc->manager) { 409 + kfree(suballoc); 410 + return; 411 + } 468 412 469 413 sa_manager = suballoc->manager; 470 414
+1
drivers/gpu/drm/i915/display/intel_display_device.c
··· 1500 1500 INTEL_PTL_IDS(INTEL_DISPLAY_DEVICE, &ptl_desc), 1501 1501 INTEL_WCL_IDS(INTEL_DISPLAY_DEVICE, &ptl_desc), 1502 1502 INTEL_NVLS_IDS(INTEL_DISPLAY_DEVICE, &nvl_desc), 1503 + INTEL_NVLP_IDS(INTEL_DISPLAY_DEVICE, &nvl_desc), 1503 1504 }; 1504 1505 1505 1506 static const struct {
+1
drivers/gpu/drm/xe/Makefile
··· 74 74 xe_guc_log.o \ 75 75 xe_guc_pagefault.o \ 76 76 xe_guc_pc.o \ 77 + xe_guc_rc.o \ 77 78 xe_guc_submit.o \ 78 79 xe_guc_tlb_inval.o \ 79 80 xe_heci_gsc.o \
+2 -3
drivers/gpu/drm/xe/display/xe_fb_pin.c
··· 256 256 size = intel_rotation_info_size(&view->rotated) * XE_PAGE_SIZE; 257 257 258 258 pte = xe_ggtt_encode_pte_flags(ggtt, bo, xe->pat.idx[XE_CACHE_NONE]); 259 - vma->node = xe_ggtt_node_insert_transform(ggtt, bo, pte, 259 + vma->node = xe_ggtt_insert_node_transform(ggtt, bo, pte, 260 260 ALIGN(size, align), align, 261 261 view->type == I915_GTT_VIEW_NORMAL ? 262 262 NULL : write_ggtt_rotated_node, ··· 352 352 353 353 if (vma->dpt) 354 354 xe_bo_unpin_map_no_vm(vma->dpt); 355 - else if (!xe_ggtt_node_allocated(vma->bo->ggtt_node[tile_id]) || 356 - vma->bo->ggtt_node[tile_id] != vma->node) 355 + else if (vma->bo->ggtt_node[tile_id] != vma->node) 357 356 xe_ggtt_node_remove(vma->node, false); 358 357 359 358 ttm_bo_reserve(&vma->bo->ttm, false, false, NULL);
+10
drivers/gpu/drm/xe/instructions/xe_gfxpipe_commands.h
··· 55 55 #define PIPELINE_SELECT GFXPIPE_SINGLE_DW_CMD(0x1, 0x4) 56 56 57 57 #define CMD_3DSTATE_DRAWING_RECTANGLE_FAST GFXPIPE_3D_CMD(0x0, 0x0) 58 + #define CMD_3DSTATE_CUSTOM_SAMPLE_PATTERN GFXPIPE_3D_CMD(0x0, 0x2) 58 59 #define CMD_3DSTATE_CLEAR_PARAMS GFXPIPE_3D_CMD(0x0, 0x4) 59 60 #define CMD_3DSTATE_DEPTH_BUFFER GFXPIPE_3D_CMD(0x0, 0x5) 60 61 #define CMD_3DSTATE_STENCIL_BUFFER GFXPIPE_3D_CMD(0x0, 0x6) ··· 139 138 #define CMD_3DSTATE_SBE_MESH GFXPIPE_3D_CMD(0x0, 0x82) 140 139 #define CMD_3DSTATE_CPSIZE_CONTROL_BUFFER GFXPIPE_3D_CMD(0x0, 0x83) 141 140 #define CMD_3DSTATE_COARSE_PIXEL GFXPIPE_3D_CMD(0x0, 0x89) 141 + #define CMD_3DSTATE_MESH_SHADER_DATA_EXT GFXPIPE_3D_CMD(0x0, 0x8A) 142 + #define CMD_3DSTATE_TASK_SHADER_DATA_EXT GFXPIPE_3D_CMD(0x0, 0x8B) 143 + #define CMD_3DSTATE_VIEWPORT_STATE_POINTERS_CC_2 GFXPIPE_3D_CMD(0x0, 0x8D) 144 + #define CMD_3DSTATE_CC_STATE_POINTERS_2 GFXPIPE_3D_CMD(0x0, 0x8E) 145 + #define CMD_3DSTATE_SCISSOR_STATE_POINTERS_2 GFXPIPE_3D_CMD(0x0, 0x8F) 146 + #define CMD_3DSTATE_BLEND_STATE_POINTERS_2 GFXPIPE_3D_CMD(0x0, 0xA0) 147 + #define CMD_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP_2 GFXPIPE_3D_CMD(0x0, 0xA1) 142 148 143 149 #define CMD_3DSTATE_DRAWING_RECTANGLE GFXPIPE_3D_CMD(0x1, 0x0) 150 + #define CMD_3DSTATE_URB_MEMORY GFXPIPE_3D_CMD(0x1, 0x1) 144 151 #define CMD_3DSTATE_CHROMA_KEY GFXPIPE_3D_CMD(0x1, 0x4) 145 152 #define CMD_3DSTATE_POLY_STIPPLE_OFFSET GFXPIPE_3D_CMD(0x1, 0x6) 146 153 #define CMD_3DSTATE_POLY_STIPPLE_PATTERN GFXPIPE_3D_CMD(0x1, 0x7) ··· 169 160 #define CMD_3DSTATE_SUBSLICE_HASH_TABLE GFXPIPE_3D_CMD(0x1, 0x1F) 170 161 #define CMD_3DSTATE_SLICE_TABLE_STATE_POINTERS GFXPIPE_3D_CMD(0x1, 0x20) 171 162 #define CMD_3DSTATE_PTBR_TILE_PASS_INFO GFXPIPE_3D_CMD(0x1, 0x22) 163 + #define CMD_3DSTATE_SLICE_TABLE_STATE_POINTER_2 GFXPIPE_3D_CMD(0x1, 0xA0) 172 164 173 165 #endif
+24 -1
drivers/gpu/drm/xe/regs/xe_gt_regs.h
··· 58 58 #define MCR_SLICE(slice) REG_FIELD_PREP(MCR_SLICE_MASK, slice) 59 59 #define MCR_SUBSLICE_MASK REG_GENMASK(26, 24) 60 60 #define MCR_SUBSLICE(subslice) REG_FIELD_PREP(MCR_SUBSLICE_MASK, subslice) 61 - #define MTL_MCR_GROUPID REG_GENMASK(11, 8) 61 + #define MTL_MCR_GROUPID REG_GENMASK(12, 8) 62 62 #define MTL_MCR_INSTANCEID REG_GENMASK(3, 0) 63 63 64 64 #define PS_INVOCATION_COUNT XE_REG(0x2348) ··· 100 100 #define VE1_AUX_INV XE_REG(0x42b8) 101 101 #define AUX_INV REG_BIT(0) 102 102 103 + #define GAMSTLB_CTRL2 XE_REG_MCR(0x4788) 104 + #define STLB_SINGLE_BANK_MODE REG_BIT(11) 105 + 103 106 #define XE2_LMEM_CFG XE_REG(0x48b0) 104 107 105 108 #define XE2_GAMWALK_CTRL 0x47e4 106 109 #define XE2_GAMWALK_CTRL_MEDIA XE_REG(XE2_GAMWALK_CTRL + MEDIA_GT_GSI_OFFSET) 107 110 #define XE2_GAMWALK_CTRL_3D XE_REG_MCR(XE2_GAMWALK_CTRL) 108 111 #define EN_CMP_1WCOH_GW REG_BIT(14) 112 + 113 + #define MMIOATSREQLIMIT_GAM_WALK_3D XE_REG_MCR(0x47f8) 114 + #define DIS_ATS_WRONLY_PG REG_BIT(18) 109 115 110 116 #define XEHP_FLAT_CCS_BASE_ADDR XE_REG_MCR(0x4910) 111 117 #define XEHP_FLAT_CCS_PTR REG_GENMASK(31, 8) ··· 131 125 #define VS_HIT_MAX_VALUE_MASK REG_GENMASK(25, 20) 132 126 #define DIS_MESH_PARTIAL_AUTOSTRIP REG_BIT(16) 133 127 #define DIS_MESH_AUTOSTRIP REG_BIT(15) 128 + #define DIS_TE_PATCH_CTRL REG_BIT(4) 134 129 135 130 #define VFLSKPD XE_REG_MCR(0x62a8, XE_REG_OPTION_MASKED) 136 131 #define DIS_PARTIAL_AUTOSTRIP REG_BIT(9) ··· 176 169 #define COMMON_SLICE_CHICKEN4 XE_REG(0x7300, XE_REG_OPTION_MASKED) 177 170 #define SBE_PUSH_CONSTANT_BEHIND_FIX_ENABLE REG_BIT(12) 178 171 #define DISABLE_TDC_LOAD_BALANCING_CALC REG_BIT(6) 172 + #define HW_FILTERING REG_BIT(5) 179 173 180 174 #define COMMON_SLICE_CHICKEN3 XE_REG(0x7304, XE_REG_OPTION_MASKED) 181 175 #define XEHP_COMMON_SLICE_CHICKEN3 XE_REG_MCR(0x7304, XE_REG_OPTION_MASKED) ··· 217 209 #define XE2_FLAT_CCS_BASE_UPPER_ADDR_MASK REG_GENMASK(7, 0) 218 210 219 211 #define GSCPSMI_BASE XE_REG(0x880c) 212 + 213 + #define CCCHKNREG2 XE_REG_MCR(0x881c) 214 + #define LOCALITYDIS REG_BIT(7) 220 215 221 216 #define CCCHKNREG1 XE_REG_MCR(0x8828) 222 217 #define L3CMPCTRL REG_BIT(23) ··· 264 253 #define XE2_GT_COMPUTE_DSS_2 XE_REG(0x914c) 265 254 #define XE2_GT_GEOMETRY_DSS_1 XE_REG(0x9150) 266 255 #define XE2_GT_GEOMETRY_DSS_2 XE_REG(0x9154) 256 + #define XE3P_XPC_GT_GEOMETRY_DSS_3 XE_REG(0x915c) 257 + #define XE3P_XPC_GT_COMPUTE_DSS_3 XE_REG(0x9160) 267 258 268 259 #define SERVICE_COPY_ENABLE XE_REG(0x9170) 269 260 #define FUSE_SERVICE_COPY_ENABLE_MASK REG_GENMASK(7, 0) ··· 380 367 #define FORCEWAKE_RENDER XE_REG(0xa278) 381 368 382 369 #define POWERGATE_DOMAIN_STATUS XE_REG(0xa2a0) 370 + #define GSC_AWAKE_STATUS REG_BIT(8) 383 371 #define MEDIA_SLICE3_AWAKE_STATUS REG_BIT(4) 384 372 #define MEDIA_SLICE2_AWAKE_STATUS REG_BIT(3) 385 373 #define MEDIA_SLICE1_AWAKE_STATUS REG_BIT(2) ··· 434 420 #define LSN_DIM_Z_WGT(value) REG_FIELD_PREP(LSN_DIM_Z_WGT_MASK, value) 435 421 436 422 #define L3SQCREG2 XE_REG_MCR(0xb104) 423 + #define L3_SQ_DISABLE_COAMA_2WAY_COH REG_BIT(30) 424 + #define L3_SQ_DISABLE_COAMA REG_BIT(22) 437 425 #define COMPMEMRD256BOVRFETCHEN REG_BIT(20) 438 426 439 427 #define L3SQCREG3 XE_REG_MCR(0xb108) ··· 475 459 #define FORCE_MISS_FTLB REG_BIT(3) 476 460 477 461 #define XEHP_GAMSTLB_CTRL XE_REG_MCR(0xcf4c) 462 + #define BANK_HASH_MODE REG_GENMASK(27, 26) 463 + #define BANK_HASH_4KB_MODE REG_FIELD_PREP(BANK_HASH_MODE, 0x3) 478 464 #define CONTROL_BLOCK_CLKGATE_DIS REG_BIT(12) 479 465 #define EGRESS_BLOCK_CLKGATE_DIS REG_BIT(11) 480 466 #define TAG_BLOCK_CLKGATE_DIS REG_BIT(7) ··· 568 550 #define UGM_FRAGMENT_THRESHOLD_TO_3 REG_BIT(58 - 32) 569 551 #define DIS_CHAIN_2XSIMD8 REG_BIT(55 - 32) 570 552 #define XE2_ALLOC_DPA_STARVE_FIX_DIS REG_BIT(47 - 32) 553 + #define SAMPLER_LD_LSC_DISABLE REG_BIT(45 - 32) 571 554 #define ENABLE_SMP_LD_RENDER_SURFACE_CONTROL REG_BIT(44 - 32) 572 555 #define FORCE_SLM_FENCE_SCOPE_TO_TILE REG_BIT(42 - 32) 573 556 #define FORCE_UGM_FENCE_SCOPE_TO_TILE REG_BIT(41 - 32) 574 557 #define MAXREQS_PER_BANK REG_GENMASK(39 - 32, 37 - 32) 575 558 #define DISABLE_128B_EVICTION_COMMAND_UDW REG_BIT(36 - 32) 559 + #define LSCFE_SAME_ADDRESS_ATOMICS_COALESCING_DISABLE REG_BIT(35 - 32) 560 + 561 + #define ROW_CHICKEN5 XE_REG_MCR(0xe7f0) 562 + #define CPSS_AWARE_DIS REG_BIT(3) 576 563 577 564 #define SARB_CHICKEN1 XE_REG_MCR(0xe90c) 578 565 #define COMP_CKN_IN REG_GENMASK(30, 29)
+8
drivers/gpu/drm/xe/regs/xe_guc_regs.h
··· 40 40 #define GS_BOOTROM_JUMP_PASSED REG_FIELD_PREP(GS_BOOTROM_MASK, 0x76) 41 41 #define GS_MIA_IN_RESET REG_BIT(0) 42 42 43 + #define BOOT_HASH_CHK XE_REG(0xc010) 44 + #define GUC_BOOT_UKERNEL_VALID REG_BIT(31) 45 + 43 46 #define GUC_HEADER_INFO XE_REG(0xc014) 44 47 45 48 #define GUC_WOPCM_SIZE XE_REG(0xc050) ··· 86 83 #define GUC_WOPCM_OFFSET_MASK REG_GENMASK(31, GUC_WOPCM_OFFSET_SHIFT) 87 84 #define HUC_LOADING_AGENT_GUC REG_BIT(1) 88 85 #define GUC_WOPCM_OFFSET_VALID REG_BIT(0) 86 + 87 + #define GUC_SRAM_STATUS XE_REG(0xc398) 88 + #define GUC_SRAM_HANDLING_MASK REG_GENMASK(8, 7) 89 + 89 90 #define GUC_MAX_IDLE_COUNT XE_REG(0xc3e4) 91 + #define GUC_IDLE_FLOW_DISABLE REG_BIT(31) 90 92 #define GUC_PMTIMESTAMP_LO XE_REG(0xc3e8) 91 93 #define GUC_PMTIMESTAMP_HI XE_REG(0xc3ec) 92 94
+96 -2
drivers/gpu/drm/xe/tests/xe_gt_sriov_pf_config_kunit.c
··· 11 11 #include "xe_pci_test.h" 12 12 13 13 #define TEST_MAX_VFS 63 14 + #define TEST_VRAM 0x37a800000ull 14 15 15 16 static void pf_set_admin_mode(struct xe_device *xe, bool enable) 16 17 { 17 18 /* should match logic of xe_sriov_pf_admin_only() */ 18 - xe->info.probe_display = !enable; 19 + xe->sriov.pf.admin_only = enable; 19 20 KUNIT_EXPECT_EQ(kunit_get_current_test(), enable, xe_sriov_pf_admin_only(xe)); 21 + } 22 + 23 + static void pf_set_usable_vram(struct xe_device *xe, u64 usable) 24 + { 25 + struct xe_tile *tile = xe_device_get_root_tile(xe); 26 + struct kunit *test = kunit_get_current_test(); 27 + 28 + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, tile); 29 + xe->mem.vram->usable_size = usable; 30 + tile->mem.vram->usable_size = usable; 31 + KUNIT_ASSERT_EQ(test, usable, xe_vram_region_usable_size(tile->mem.vram)); 20 32 } 21 33 22 34 static const void *num_vfs_gen_param(struct kunit *test, const void *prev, char *desc) ··· 46 34 { 47 35 struct xe_pci_fake_data fake = { 48 36 .sriov_mode = XE_SRIOV_MODE_PF, 49 - .platform = XE_TIGERLAKE, /* any random platform with SR-IOV */ 37 + .platform = XE_BATTLEMAGE, /* any random DGFX platform with SR-IOV */ 50 38 .subplatform = XE_SUBPLATFORM_NONE, 39 + .graphics_verx100 = 2001, 51 40 }; 41 + struct xe_vram_region *vram; 52 42 struct xe_device *xe; 53 43 struct xe_gt *gt; 54 44 ··· 63 49 gt = xe_root_mmio_gt(xe); 64 50 KUNIT_ASSERT_NOT_ERR_OR_NULL(test, gt); 65 51 test->priv = gt; 52 + 53 + /* pretend it has some VRAM */ 54 + KUNIT_ASSERT_TRUE(test, IS_DGFX(xe)); 55 + vram = kunit_kzalloc(test, sizeof(*vram), GFP_KERNEL); 56 + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, vram); 57 + vram->usable_size = TEST_VRAM; 58 + xe->mem.vram = vram; 59 + xe->tiles[0].mem.vram = vram; 60 + 61 + /* pretend we have a valid LMTT */ 62 + KUNIT_ASSERT_TRUE(test, xe_device_has_lmtt(xe)); 63 + KUNIT_ASSERT_GE(test, GRAPHICS_VERx100(xe), 1260); 64 + xe->tiles[0].sriov.pf.lmtt.ops = &lmtt_ml_ops; 66 65 67 66 /* pretend it can support up to 63 VFs */ 68 67 xe->sriov.pf.device_total_vfs = TEST_MAX_VFS; ··· 216 189 KUNIT_ASSERT_EQ(test, SZ_2G, pf_profile_fair_ggtt(gt, num_vfs)); 217 190 } 218 191 192 + static const u64 vram_sizes[] = { 193 + SZ_4G - SZ_512M, 194 + SZ_8G + SZ_4G - SZ_512M, 195 + SZ_16G - SZ_512M, 196 + SZ_32G - SZ_512M, 197 + SZ_64G - SZ_512M, 198 + TEST_VRAM, 199 + }; 200 + 201 + static void u64_param_get_desc(const u64 *p, char *desc) 202 + { 203 + string_get_size(*p, 1, STRING_UNITS_2, desc, KUNIT_PARAM_DESC_SIZE); 204 + } 205 + 206 + KUNIT_ARRAY_PARAM(vram_size, vram_sizes, u64_param_get_desc); 207 + 208 + static void fair_vram_1vf(struct kunit *test) 209 + { 210 + const u64 usable = *(const u64 *)test->param_value; 211 + struct xe_gt *gt = test->priv; 212 + struct xe_device *xe = gt_to_xe(gt); 213 + 214 + pf_set_admin_mode(xe, false); 215 + pf_set_usable_vram(xe, usable); 216 + 217 + KUNIT_EXPECT_NE(test, 0, pf_profile_fair_lmem(gt, 1)); 218 + KUNIT_EXPECT_GE(test, usable, pf_profile_fair_lmem(gt, 1)); 219 + KUNIT_EXPECT_TRUE(test, is_power_of_2(pf_profile_fair_lmem(gt, 1))); 220 + KUNIT_EXPECT_GE(test, usable - pf_profile_fair_lmem(gt, 1), pf_profile_fair_lmem(gt, 1)); 221 + } 222 + 223 + static void fair_vram_1vf_admin_only(struct kunit *test) 224 + { 225 + const u64 usable = *(const u64 *)test->param_value; 226 + struct xe_gt *gt = test->priv; 227 + struct xe_device *xe = gt_to_xe(gt); 228 + 229 + pf_set_admin_mode(xe, true); 230 + pf_set_usable_vram(xe, usable); 231 + 232 + KUNIT_EXPECT_NE(test, 0, pf_profile_fair_lmem(gt, 1)); 233 + KUNIT_EXPECT_GE(test, usable, pf_profile_fair_lmem(gt, 1)); 234 + KUNIT_EXPECT_LT(test, usable - pf_profile_fair_lmem(gt, 1), pf_profile_fair_lmem(gt, 1)); 235 + KUNIT_EXPECT_TRUE(test, IS_ALIGNED(pf_profile_fair_lmem(gt, 1), SZ_1G)); 236 + } 237 + 238 + static void fair_vram(struct kunit *test) 239 + { 240 + unsigned int num_vfs = (unsigned long)test->param_value; 241 + struct xe_gt *gt = test->priv; 242 + struct xe_device *xe = gt_to_xe(gt); 243 + u64 alignment = pf_get_lmem_alignment(gt); 244 + char size[10]; 245 + 246 + pf_set_admin_mode(xe, false); 247 + 248 + string_get_size(pf_profile_fair_lmem(gt, num_vfs), 1, STRING_UNITS_2, size, sizeof(size)); 249 + kunit_info(test, "fair %s %llx\n", size, pf_profile_fair_lmem(gt, num_vfs)); 250 + 251 + KUNIT_EXPECT_TRUE(test, is_power_of_2(pf_profile_fair_lmem(gt, num_vfs))); 252 + KUNIT_EXPECT_TRUE(test, IS_ALIGNED(pf_profile_fair_lmem(gt, num_vfs), alignment)); 253 + KUNIT_EXPECT_GE(test, TEST_VRAM, num_vfs * pf_profile_fair_lmem(gt, num_vfs)); 254 + } 255 + 219 256 static struct kunit_case pf_gt_config_test_cases[] = { 220 257 KUNIT_CASE(fair_contexts_1vf), 221 258 KUNIT_CASE(fair_doorbells_1vf), 222 259 KUNIT_CASE(fair_ggtt_1vf), 260 + KUNIT_CASE_PARAM(fair_vram_1vf, vram_size_gen_params), 261 + KUNIT_CASE_PARAM(fair_vram_1vf_admin_only, vram_size_gen_params), 223 262 KUNIT_CASE_PARAM(fair_contexts, num_vfs_gen_param), 224 263 KUNIT_CASE_PARAM(fair_doorbells, num_vfs_gen_param), 225 264 KUNIT_CASE_PARAM(fair_ggtt, num_vfs_gen_param), 265 + KUNIT_CASE_PARAM(fair_vram, num_vfs_gen_param), 226 266 {} 227 267 }; 228 268
+1 -5
drivers/gpu/drm/xe/tests/xe_guc_buf_kunit.c
··· 38 38 if (flags & XE_BO_FLAG_GGTT) { 39 39 struct xe_ggtt *ggtt = tile->mem.ggtt; 40 40 41 - bo->ggtt_node[tile->id] = xe_ggtt_node_init(ggtt); 41 + bo->ggtt_node[tile->id] = xe_ggtt_insert_node(ggtt, xe_bo_size(bo), SZ_4K); 42 42 KUNIT_ASSERT_NOT_ERR_OR_NULL(test, bo->ggtt_node[tile->id]); 43 - 44 - KUNIT_ASSERT_EQ(test, 0, 45 - xe_ggtt_node_insert(bo->ggtt_node[tile->id], 46 - xe_bo_size(bo), SZ_4K)); 47 43 } 48 44 49 45 return bo;
+59 -2
drivers/gpu/drm/xe/tests/xe_guc_g2g_test.c
··· 48 48 u32 seqno; 49 49 }; 50 50 51 + static int slot_index_from_gts(struct xe_gt *tx_gt, struct xe_gt *rx_gt) 52 + { 53 + struct xe_device *xe = gt_to_xe(tx_gt); 54 + int idx = 0, found = 0, id, tx_idx, rx_idx; 55 + struct xe_gt *gt; 56 + struct kunit *test = kunit_get_current_test(); 57 + 58 + for (id = 0; id < xe->info.tile_count * xe->info.max_gt_per_tile; id++) { 59 + gt = xe_device_get_gt(xe, id); 60 + if (!gt) 61 + continue; 62 + if (gt == tx_gt) { 63 + tx_idx = idx; 64 + found++; 65 + } 66 + if (gt == rx_gt) { 67 + rx_idx = idx; 68 + found++; 69 + } 70 + 71 + if (found == 2) 72 + break; 73 + 74 + idx++; 75 + } 76 + 77 + if (found != 2) 78 + KUNIT_FAIL(test, "GT index not found"); 79 + 80 + return (tx_idx * xe->info.gt_count) + rx_idx; 81 + } 82 + 51 83 static void g2g_test_send(struct kunit *test, struct xe_guc *guc, 52 84 u32 far_tile, u32 far_dev, 53 85 struct g2g_test_payload *payload) ··· 195 163 goto done; 196 164 } 197 165 198 - idx = (tx_gt->info.id * xe->info.gt_count) + rx_gt->info.id; 166 + idx = slot_index_from_gts(tx_gt, rx_gt); 199 167 200 168 if (xe->g2g_test_array[idx] != payload->seqno - 1) { 201 169 xe_gt_err(rx_gt, "G2G: Seqno mismatch %d vs %d for %d:%d -> %d:%d!\n", ··· 212 180 return ret; 213 181 } 214 182 183 + #define G2G_WAIT_TIMEOUT_MS 100 184 + #define G2G_WAIT_POLL_MS 1 185 + 215 186 /* 216 187 * Send the given seqno from all GuCs to all other GuCs in tile/GT order 217 188 */ 218 189 static void g2g_test_in_order(struct kunit *test, struct xe_device *xe, u32 seqno) 219 190 { 220 191 struct xe_gt *near_gt, *far_gt; 221 - int i, j; 192 + int i, j, waited; 193 + u32 idx; 222 194 223 195 for_each_gt(near_gt, xe, i) { 224 196 u32 near_tile = gt_to_tile(near_gt)->id; ··· 241 205 payload.rx_dev = far_dev; 242 206 payload.rx_tile = far_tile; 243 207 payload.seqno = seqno; 208 + 209 + /* Calculate idx for event-based wait */ 210 + idx = slot_index_from_gts(near_gt, far_gt); 211 + waited = 0; 212 + 213 + /* 214 + * Wait for previous seqno to be acknowledged before sending, 215 + * to avoid queuing too many back-to-back messages and 216 + * causing a test timeout. Actual correctness of message 217 + * will be checked later in xe_guc_g2g_test_notification() 218 + */ 219 + while (xe->g2g_test_array[idx] != (seqno - 1)) { 220 + msleep(G2G_WAIT_POLL_MS); 221 + waited += G2G_WAIT_POLL_MS; 222 + if (waited >= G2G_WAIT_TIMEOUT_MS) { 223 + kunit_info(test, "Timeout waiting! tx gt: %d, rx gt: %d\n", 224 + near_gt->info.id, far_gt->info.id); 225 + break; 226 + } 227 + } 228 + 244 229 g2g_test_send(test, &near_gt->uc.guc, far_tile, far_dev, &payload); 245 230 } 246 231 }
+8
drivers/gpu/drm/xe/tests/xe_pci_test.c
··· 19 19 const struct xe_ip *param = test->param_value; 20 20 const struct xe_graphics_desc *graphics = param->desc; 21 21 u64 mask = graphics->hw_engine_mask; 22 + u8 fuse_regs = graphics->num_geometry_xecore_fuse_regs + 23 + graphics->num_compute_xecore_fuse_regs; 22 24 23 25 /* RCS, CCS, and BCS engines are allowed on the graphics IP */ 24 26 mask &= ~(XE_HW_ENGINE_RCS_MASK | ··· 29 27 30 28 /* Any remaining engines are an error */ 31 29 KUNIT_ASSERT_EQ(test, mask, 0); 30 + 31 + /* 32 + * All graphics IP should have at least one geometry and/or compute 33 + * XeCore fuse register. 34 + */ 35 + KUNIT_ASSERT_GE(test, fuse_regs, 1); 32 36 } 33 37 34 38 static void check_media_ip(struct kunit *test)
+2 -1
drivers/gpu/drm/xe/tests/xe_rtp_test.c
··· 322 322 count_rtp_entries++; 323 323 324 324 xe_rtp_process_ctx_enable_active_tracking(&ctx, &active, count_rtp_entries); 325 - xe_rtp_process_to_sr(&ctx, param->entries, count_rtp_entries, reg_sr); 325 + xe_rtp_process_to_sr(&ctx, param->entries, count_rtp_entries, 326 + reg_sr, false); 326 327 327 328 xa_for_each(&reg_sr->xa, idx, sre) { 328 329 if (idx == param->expected_reg.addr)
+43 -16
drivers/gpu/drm/xe/xe_bb.c
··· 59 59 return ERR_PTR(err); 60 60 } 61 61 62 - struct xe_bb *xe_bb_ccs_new(struct xe_gt *gt, u32 dwords, 63 - enum xe_sriov_vf_ccs_rw_ctxs ctx_id) 62 + /** 63 + * xe_bb_alloc() - Allocate a new batch buffer structure 64 + * @gt: the &xe_gt 65 + * 66 + * Allocates and initializes a new xe_bb structure with an associated 67 + * uninitialized suballoc object. 68 + * 69 + * Returns: Batch buffer structure or an ERR_PTR(-ENOMEM). 70 + */ 71 + struct xe_bb *xe_bb_alloc(struct xe_gt *gt) 64 72 { 65 73 struct xe_bb *bb = kmalloc_obj(*bb); 66 - struct xe_device *xe = gt_to_xe(gt); 67 - struct xe_sa_manager *bb_pool; 68 74 int err; 69 75 70 76 if (!bb) 71 77 return ERR_PTR(-ENOMEM); 78 + 79 + bb->bo = xe_sa_bo_alloc(GFP_KERNEL); 80 + if (IS_ERR(bb->bo)) { 81 + err = PTR_ERR(bb->bo); 82 + goto err; 83 + } 84 + 85 + return bb; 86 + 87 + err: 88 + kfree(bb); 89 + return ERR_PTR(err); 90 + } 91 + 92 + /** 93 + * xe_bb_init() - Initialize a batch buffer with memory from a sub-allocator pool 94 + * @bb: Batch buffer structure to initialize 95 + * @bb_pool: Suballoc memory pool to allocate from 96 + * @dwords: Number of dwords to be allocated 97 + * 98 + * Initializes the batch buffer by allocating memory from the specified 99 + * suballoc pool. 100 + * 101 + * Return: 0 on success, negative error code on failure. 102 + */ 103 + int xe_bb_init(struct xe_bb *bb, struct xe_sa_manager *bb_pool, u32 dwords) 104 + { 105 + int err; 106 + 72 107 /* 73 108 * We need to allocate space for the requested number of dwords & 74 109 * one additional MI_BATCH_BUFFER_END dword. Since the whole SA ··· 111 76 * is not over written when the last chunk of SA is allocated for BB. 112 77 * So, this extra DW acts as a guard here. 113 78 */ 114 - 115 - bb_pool = xe->sriov.vf.ccs.contexts[ctx_id].mem.ccs_bb_pool; 116 - bb->bo = xe_sa_bo_new(bb_pool, 4 * (dwords + 1)); 117 - 118 - if (IS_ERR(bb->bo)) { 119 - err = PTR_ERR(bb->bo); 120 - goto err; 121 - } 79 + err = xe_sa_bo_init(bb_pool, bb->bo, 4 * (dwords + 1)); 80 + if (err) 81 + return err; 122 82 123 83 bb->cs = xe_sa_bo_cpu_addr(bb->bo); 124 84 bb->len = 0; 125 85 126 - return bb; 127 - err: 128 - kfree(bb); 129 - return ERR_PTR(err); 86 + return 0; 130 87 } 131 88 132 89 static struct xe_sched_job *
+3 -3
drivers/gpu/drm/xe/xe_bb.h
··· 12 12 13 13 struct xe_gt; 14 14 struct xe_exec_queue; 15 + struct xe_sa_manager; 15 16 struct xe_sched_job; 16 - enum xe_sriov_vf_ccs_rw_ctxs; 17 17 18 18 struct xe_bb *xe_bb_new(struct xe_gt *gt, u32 dwords, bool usm); 19 - struct xe_bb *xe_bb_ccs_new(struct xe_gt *gt, u32 dwords, 20 - enum xe_sriov_vf_ccs_rw_ctxs ctx_id); 19 + struct xe_bb *xe_bb_alloc(struct xe_gt *gt); 20 + int xe_bb_init(struct xe_bb *bb, struct xe_sa_manager *bb_pool, u32 dwords); 21 21 struct xe_sched_job *xe_bb_create_job(struct xe_exec_queue *q, 22 22 struct xe_bb *bb); 23 23 struct xe_sched_job *xe_bb_create_migration_job(struct xe_exec_queue *q,
+2 -2
drivers/gpu/drm/xe/xe_bo.c
··· 512 512 /* 513 513 * Display scanout is always non-coherent with the CPU cache. 514 514 * 515 - * For Xe_LPG and beyond, PPGTT PTE lookups are also 516 - * non-coherent and require a CPU:WC mapping. 515 + * For Xe_LPG and beyond up to NVL-P (excluding), PPGTT PTE 516 + * lookups are also non-coherent and require a CPU:WC mapping. 517 517 */ 518 518 if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_SCANOUT) || 519 519 (!xe->info.has_cached_pt && bo->flags & XE_BO_FLAG_PAGETABLE))
+64 -1
drivers/gpu/drm/xe/xe_configfs.c
··· 15 15 16 16 #include "instructions/xe_mi_commands.h" 17 17 #include "xe_configfs.h" 18 + #include "xe_defaults.h" 18 19 #include "xe_gt_types.h" 19 20 #include "xe_hw_engine_types.h" 20 21 #include "xe_module.h" ··· 264 263 bool enable_psmi; 265 264 struct { 266 265 unsigned int max_vfs; 266 + bool admin_only_pf; 267 267 } sriov; 268 268 } config; 269 269 ··· 282 280 .survivability_mode = false, 283 281 .enable_psmi = false, 284 282 .sriov = { 285 - .max_vfs = UINT_MAX, 283 + .max_vfs = XE_DEFAULT_MAX_VFS, 284 + .admin_only_pf = XE_DEFAULT_ADMIN_ONLY_PF, 286 285 }, 287 286 }; 288 287 ··· 833 830 834 831 mutex_destroy(&dev->lock); 835 832 833 + kfree(dev->config.ctx_restore_mid_bb[0].cs); 836 834 kfree(dev->config.ctx_restore_post_bb[0].cs); 837 835 kfree(dev); 838 836 } ··· 900 896 return len; 901 897 } 902 898 899 + static ssize_t sriov_admin_only_pf_show(struct config_item *item, char *page) 900 + { 901 + struct xe_config_group_device *dev = to_xe_config_group_device(item->ci_parent); 902 + 903 + guard(mutex)(&dev->lock); 904 + 905 + return sprintf(page, "%s\n", str_yes_no(dev->config.sriov.admin_only_pf)); 906 + } 907 + 908 + static ssize_t sriov_admin_only_pf_store(struct config_item *item, const char *page, size_t len) 909 + { 910 + struct xe_config_group_device *dev = to_xe_config_group_device(item->ci_parent); 911 + bool admin_only_pf; 912 + int ret; 913 + 914 + guard(mutex)(&dev->lock); 915 + 916 + if (is_bound(dev)) 917 + return -EBUSY; 918 + 919 + ret = kstrtobool(page, &admin_only_pf); 920 + if (ret) 921 + return ret; 922 + 923 + dev->config.sriov.admin_only_pf = admin_only_pf; 924 + return len; 925 + } 926 + 903 927 CONFIGFS_ATTR(sriov_, max_vfs); 928 + CONFIGFS_ATTR(sriov_, admin_only_pf); 904 929 905 930 static struct configfs_attribute *xe_config_sriov_attrs[] = { 906 931 &sriov_attr_max_vfs, 932 + &sriov_attr_admin_only_pf, 907 933 NULL, 908 934 }; 909 935 ··· 943 909 struct xe_config_group_device *dev = to_xe_config_group_device(item->ci_parent); 944 910 945 911 if (attr == &sriov_attr_max_vfs && dev->mode != XE_SRIOV_MODE_PF) 912 + return false; 913 + if (attr == &sriov_attr_admin_only_pf && dev->mode != XE_SRIOV_MODE_PF) 946 914 return false; 947 915 948 916 return true; ··· 1099 1063 PRI_CUSTOM_ATTR("%llx", engines_allowed); 1100 1064 PRI_CUSTOM_ATTR("%d", enable_psmi); 1101 1065 PRI_CUSTOM_ATTR("%d", survivability_mode); 1066 + PRI_CUSTOM_ATTR("%u", sriov.admin_only_pf); 1102 1067 1103 1068 #undef PRI_CUSTOM_ATTR 1104 1069 } ··· 1278 1241 } 1279 1242 1280 1243 #ifdef CONFIG_PCI_IOV 1244 + /** 1245 + * xe_configfs_admin_only_pf() - Get PF's operational mode. 1246 + * @pdev: the &pci_dev device 1247 + * 1248 + * Find the configfs group that belongs to the PCI device and return a flag 1249 + * whether the PF driver should be dedicated for VFs management only. 1250 + * 1251 + * If configfs group is not present, use driver's default value. 1252 + * 1253 + * Return: true if PF driver is dedicated for VFs administration only. 1254 + */ 1255 + bool xe_configfs_admin_only_pf(struct pci_dev *pdev) 1256 + { 1257 + struct xe_config_group_device *dev = find_xe_config_group_device(pdev); 1258 + bool admin_only_pf; 1259 + 1260 + if (!dev) 1261 + return XE_DEFAULT_ADMIN_ONLY_PF; 1262 + 1263 + scoped_guard(mutex, &dev->lock) 1264 + admin_only_pf = dev->config.sriov.admin_only_pf; 1265 + 1266 + config_group_put(&dev->group); 1267 + 1268 + return admin_only_pf; 1269 + } 1281 1270 /** 1282 1271 * xe_configfs_get_max_vfs() - Get number of VFs that could be managed 1283 1272 * @pdev: the &pci_dev device
+14 -2
drivers/gpu/drm/xe/xe_configfs.h
··· 8 8 #include <linux/limits.h> 9 9 #include <linux/types.h> 10 10 11 - #include <xe_hw_engine_types.h> 11 + #include "xe_defaults.h" 12 + #include "xe_hw_engine_types.h" 13 + #include "xe_module.h" 12 14 13 15 struct pci_dev; 14 16 ··· 31 29 const u32 **cs); 32 30 #ifdef CONFIG_PCI_IOV 33 31 unsigned int xe_configfs_get_max_vfs(struct pci_dev *pdev); 32 + bool xe_configfs_admin_only_pf(struct pci_dev *pdev); 34 33 #endif 35 34 #else 36 35 static inline int xe_configfs_init(void) { return 0; } ··· 48 45 static inline u32 xe_configfs_get_ctx_restore_post_bb(struct pci_dev *pdev, 49 46 enum xe_engine_class class, 50 47 const u32 **cs) { return 0; } 51 - static inline unsigned int xe_configfs_get_max_vfs(struct pci_dev *pdev) { return UINT_MAX; } 48 + #ifdef CONFIG_PCI_IOV 49 + static inline unsigned int xe_configfs_get_max_vfs(struct pci_dev *pdev) 50 + { 51 + return xe_modparam.max_vfs; 52 + } 53 + static inline bool xe_configfs_admin_only_pf(struct pci_dev *pdev) 54 + { 55 + return XE_DEFAULT_ADMIN_ONLY_PF; 56 + } 57 + #endif 52 58 #endif 53 59 54 60 #endif
+26
drivers/gpu/drm/xe/xe_defaults.h
··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2026 Intel Corporation 4 + */ 5 + #ifndef _XE_DEFAULTS_H_ 6 + #define _XE_DEFAULTS_H_ 7 + 8 + #include "xe_device_types.h" 9 + 10 + #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) 11 + #define XE_DEFAULT_GUC_LOG_LEVEL 3 12 + #else 13 + #define XE_DEFAULT_GUC_LOG_LEVEL 1 14 + #endif 15 + 16 + #define XE_DEFAULT_PROBE_DISPLAY IS_ENABLED(CONFIG_DRM_XE_DISPLAY) 17 + #define XE_DEFAULT_VRAM_BAR_SIZE 0 18 + #define XE_DEFAULT_FORCE_PROBE CONFIG_DRM_XE_FORCE_PROBE 19 + #define XE_DEFAULT_MAX_VFS ~0 20 + #define XE_DEFAULT_MAX_VFS_STR "unlimited" 21 + #define XE_DEFAULT_ADMIN_ONLY_PF false 22 + #define XE_DEFAULT_WEDGED_MODE XE_WEDGED_MODE_UPON_CRITICAL_ERROR 23 + #define XE_DEFAULT_WEDGED_MODE_STR "upon-critical-error" 24 + #define XE_DEFAULT_SVM_NOTIFIER_SIZE 512 25 + 26 + #endif
+1 -1
drivers/gpu/drm/xe/xe_devcoredump.c
··· 356 356 357 357 xe_engine_snapshot_capture_for_queue(q); 358 358 359 - queue_work(system_unbound_wq, &ss->work); 359 + queue_work(system_dfl_wq, &ss->work); 360 360 361 361 dma_fence_end_signalling(cookie); 362 362 }
+39 -16
drivers/gpu/drm/xe/xe_device.c
··· 26 26 #include "xe_bo.h" 27 27 #include "xe_bo_evict.h" 28 28 #include "xe_debugfs.h" 29 + #include "xe_defaults.h" 29 30 #include "xe_devcoredump.h" 30 31 #include "xe_device_sysfs.h" 31 32 #include "xe_dma_buf.h" ··· 456 455 xe->drm.anon_inode->i_mapping, 457 456 xe->drm.vma_offset_manager, 0); 458 457 if (WARN_ON(err)) 459 - goto err; 458 + return ERR_PTR(err); 460 459 461 460 xe_bo_dev_init(&xe->bo_device); 462 461 err = drmm_add_action_or_reset(&xe->drm, xe_device_destroy, NULL); 463 462 if (err) 464 - goto err; 463 + return ERR_PTR(err); 465 464 466 465 err = xe_shrinker_create(xe); 467 466 if (err) 468 - goto err; 467 + return ERR_PTR(err); 469 468 470 469 xe->info.devid = pdev->device; 471 470 xe->info.revid = pdev->revision; ··· 475 474 476 475 err = xe_irq_init(xe); 477 476 if (err) 478 - goto err; 477 + return ERR_PTR(err); 479 478 480 479 xe_validation_device_init(&xe->val); 481 480 ··· 485 484 486 485 err = xe_pagemap_shrinker_create(xe); 487 486 if (err) 488 - goto err; 487 + return ERR_PTR(err); 489 488 490 489 xa_init_flags(&xe->usm.asid_to_vm, XA_FLAGS_ALLOC); 491 490 ··· 504 503 505 504 err = xe_bo_pinned_init(xe); 506 505 if (err) 507 - goto err; 506 + return ERR_PTR(err); 508 507 509 508 xe->preempt_fence_wq = alloc_ordered_workqueue("xe-preempt-fence-wq", 510 509 WQ_MEM_RECLAIM); 511 510 xe->ordered_wq = alloc_ordered_workqueue("xe-ordered-wq", 0); 512 - xe->unordered_wq = alloc_workqueue("xe-unordered-wq", 0, 0); 513 - xe->destroy_wq = alloc_workqueue("xe-destroy-wq", 0, 0); 511 + xe->unordered_wq = alloc_workqueue("xe-unordered-wq", WQ_PERCPU, 0); 512 + xe->destroy_wq = alloc_workqueue("xe-destroy-wq", WQ_PERCPU, 0); 514 513 if (!xe->ordered_wq || !xe->unordered_wq || 515 514 !xe->preempt_fence_wq || !xe->destroy_wq) { 516 515 /* ··· 518 517 * drmm_add_action_or_reset register above 519 518 */ 520 519 drm_err(&xe->drm, "Failed to allocate xe workqueues\n"); 521 - err = -ENOMEM; 522 - goto err; 520 + return ERR_PTR(-ENOMEM); 523 521 } 524 522 525 523 err = drmm_mutex_init(&xe->drm, &xe->pmt.lock); 526 524 if (err) 527 - goto err; 525 + return ERR_PTR(err); 528 526 529 527 return xe; 530 - 531 - err: 532 - return ERR_PTR(err); 533 528 } 534 529 ALLOW_ERROR_INJECTION(xe_device_create, ERRNO); /* See xe_pci_probe() */ 535 530 ··· 740 743 assert_lmem_ready(xe); 741 744 742 745 xe->wedged.mode = xe_device_validate_wedged_mode(xe, xe_modparam.wedged_mode) ? 743 - XE_WEDGED_MODE_DEFAULT : xe_modparam.wedged_mode; 746 + XE_DEFAULT_WEDGED_MODE : xe_modparam.wedged_mode; 744 747 drm_dbg(&xe->drm, "wedged_mode: setting mode (%u) %s\n", 745 748 xe->wedged.mode, xe_wedged_mode_to_string(xe->wedged.mode)); 746 749 ··· 1308 1311 xe->needs_flr_on_fini = true; 1309 1312 drm_err(&xe->drm, 1310 1313 "CRITICAL: Xe has declared device %s as wedged.\n" 1311 - "IOCTLs and executions are blocked. Only a rebind may clear the failure\n" 1314 + "IOCTLs and executions are blocked.\n" 1315 + "For recovery procedure, refer to https://docs.kernel.org/gpu/drm-uapi.html#device-wedging\n" 1312 1316 "Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new\n", 1313 1317 dev_name(xe->drm.dev)); 1314 1318 } ··· 1371 1373 default: 1372 1374 return "<invalid>"; 1373 1375 } 1376 + } 1377 + 1378 + /** 1379 + * xe_device_asid_to_vm() - Find VM from ASID 1380 + * @xe: the &xe_device 1381 + * @asid: Address space ID 1382 + * 1383 + * Find a VM from ASID and take a reference to VM which caller must drop. 1384 + * Reclaim safe. 1385 + * 1386 + * Return: VM on success, ERR_PTR on failure 1387 + */ 1388 + struct xe_vm *xe_device_asid_to_vm(struct xe_device *xe, u32 asid) 1389 + { 1390 + struct xe_vm *vm; 1391 + 1392 + down_read(&xe->usm.lock); 1393 + vm = xa_load(&xe->usm.asid_to_vm, asid); 1394 + if (vm) 1395 + xe_vm_get(vm); 1396 + else 1397 + vm = ERR_PTR(-EINVAL); 1398 + up_read(&xe->usm.lock); 1399 + 1400 + return vm; 1374 1401 }
+9 -7
drivers/gpu/drm/xe/xe_device.h
··· 12 12 #include "xe_gt_types.h" 13 13 #include "xe_sriov.h" 14 14 15 + struct xe_vm; 16 + 15 17 static inline struct xe_device *to_xe_device(const struct drm_device *dev) 16 18 { 17 19 return container_of(dev, struct xe_device, drm); ··· 62 60 return &xe->tiles[0]; 63 61 } 64 62 65 - /* 66 - * Highest GT/tile count for any platform. Used only for memory allocation 67 - * sizing. Any logic looping over GTs or mapping userspace GT IDs into GT 68 - * structures should use the per-platform xe->info.max_gt_per_tile instead. 69 - */ 70 - #define XE_MAX_GT_PER_TILE 2 71 - 72 63 static inline struct xe_gt *xe_device_get_gt(struct xe_device *xe, u8 gt_id) 73 64 { 74 65 struct xe_tile *tile; ··· 107 112 static inline struct xe_gt *xe_root_mmio_gt(struct xe_device *xe) 108 113 { 109 114 return xe_device_get_root_tile(xe)->primary_gt; 115 + } 116 + 117 + static inline struct xe_mmio *xe_root_tile_mmio(struct xe_device *xe) 118 + { 119 + return &xe->tiles[0].mmio; 110 120 } 111 121 112 122 static inline bool xe_device_uc_enabled(struct xe_device *xe) ··· 203 203 int xe_is_injection_active(void); 204 204 205 205 bool xe_is_xe_file(const struct file *file); 206 + 207 + struct xe_vm *xe_device_asid_to_vm(struct xe_device *xe, u32 asid); 206 208 207 209 /* 208 210 * Occasionally it is seen that the G2H worker starts running after a delay of more than
+17 -173
drivers/gpu/drm/xe/xe_device_types.h
··· 15 15 #include "xe_devcoredump_types.h" 16 16 #include "xe_heci_gsc.h" 17 17 #include "xe_late_bind_fw_types.h" 18 - #include "xe_lmtt_types.h" 19 - #include "xe_memirq_types.h" 20 - #include "xe_mert.h" 21 18 #include "xe_oa_types.h" 22 19 #include "xe_pagefault_types.h" 23 20 #include "xe_platform_types.h" ··· 26 29 #include "xe_sriov_vf_ccs_types.h" 27 30 #include "xe_step_types.h" 28 31 #include "xe_survivability_mode_types.h" 29 - #include "xe_tile_sriov_vf_types.h" 32 + #include "xe_tile_types.h" 30 33 #include "xe_validation.h" 31 34 32 35 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) 33 36 #define TEST_VM_OPS_ERROR 34 37 #endif 35 38 36 - struct dram_info; 37 39 struct drm_pagemap_shrinker; 38 40 struct intel_display; 39 41 struct intel_dg_nvm_dev; ··· 58 62 XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET = 2, 59 63 }; 60 64 61 - #define XE_WEDGED_MODE_DEFAULT XE_WEDGED_MODE_UPON_CRITICAL_ERROR 62 - #define XE_WEDGED_MODE_DEFAULT_STR "upon-critical-error" 63 - 64 65 #define XE_BO_INVALID_OFFSET LONG_MAX 65 66 66 67 #define GRAPHICS_VER(xe) ((xe)->info.graphics_verx100 / 100) ··· 72 79 #define XE_GT1 1 73 80 #define XE_MAX_TILES_PER_DEVICE (XE_GT1 + 1) 74 81 82 + /* 83 + * Highest GT/tile count for any platform. Used only for memory allocation 84 + * sizing. Any logic looping over GTs or mapping userspace GT IDs into GT 85 + * structures should use the per-platform xe->info.max_gt_per_tile instead. 86 + */ 87 + #define XE_MAX_GT_PER_TILE 2 88 + 75 89 #define XE_MAX_ASID (BIT(20)) 76 90 77 91 #define IS_PLATFORM_STEP(_xe, _platform, min_step, max_step) \ ··· 90 90 (_xe)->info.subplatform == (sub) && \ 91 91 (_xe)->info.step.graphics >= (min_step) && \ 92 92 (_xe)->info.step.graphics < (max_step)) 93 - 94 - #define tile_to_xe(tile__) \ 95 - _Generic(tile__, \ 96 - const struct xe_tile * : (const struct xe_device *)((tile__)->xe), \ 97 - struct xe_tile * : (tile__)->xe) 98 - 99 - /** 100 - * struct xe_mmio - register mmio structure 101 - * 102 - * Represents an MMIO region that the CPU may use to access registers. A 103 - * region may share its IO map with other regions (e.g., all GTs within a 104 - * tile share the same map with their parent tile, but represent different 105 - * subregions of the overall IO space). 106 - */ 107 - struct xe_mmio { 108 - /** @tile: Backpointer to tile, used for tracing */ 109 - struct xe_tile *tile; 110 - 111 - /** @regs: Map used to access registers. */ 112 - void __iomem *regs; 113 - 114 - /** 115 - * @sriov_vf_gt: Backpointer to GT. 116 - * 117 - * This pointer is only set for GT MMIO regions and only when running 118 - * as an SRIOV VF structure 119 - */ 120 - struct xe_gt *sriov_vf_gt; 121 - 122 - /** 123 - * @regs_size: Length of the register region within the map. 124 - * 125 - * The size of the iomap set in *regs is generally larger than the 126 - * register mmio space since it includes unused regions and/or 127 - * non-register regions such as the GGTT PTEs. 128 - */ 129 - size_t regs_size; 130 - 131 - /** @adj_limit: adjust MMIO address if address is below this value */ 132 - u32 adj_limit; 133 - 134 - /** @adj_offset: offset to add to MMIO address when adjusting */ 135 - u32 adj_offset; 136 - }; 137 - 138 - /** 139 - * struct xe_tile - hardware tile structure 140 - * 141 - * From a driver perspective, a "tile" is effectively a complete GPU, containing 142 - * an SGunit, 1-2 GTs, and (for discrete platforms) VRAM. 143 - * 144 - * Multi-tile platforms effectively bundle multiple GPUs behind a single PCI 145 - * device and designate one "root" tile as being responsible for external PCI 146 - * communication. PCI BAR0 exposes the GGTT and MMIO register space for each 147 - * tile in a stacked layout, and PCI BAR2 exposes the local memory associated 148 - * with each tile similarly. Device-wide interrupts can be enabled/disabled 149 - * at the root tile, and the MSTR_TILE_INTR register will report which tiles 150 - * have interrupts that need servicing. 151 - */ 152 - struct xe_tile { 153 - /** @xe: Backpointer to tile's PCI device */ 154 - struct xe_device *xe; 155 - 156 - /** @id: ID of the tile */ 157 - u8 id; 158 - 159 - /** 160 - * @primary_gt: Primary GT 161 - */ 162 - struct xe_gt *primary_gt; 163 - 164 - /** 165 - * @media_gt: Media GT 166 - * 167 - * Only present on devices with media version >= 13. 168 - */ 169 - struct xe_gt *media_gt; 170 - 171 - /** 172 - * @mmio: MMIO info for a tile. 173 - * 174 - * Each tile has its own 16MB space in BAR0, laid out as: 175 - * * 0-4MB: registers 176 - * * 4MB-8MB: reserved 177 - * * 8MB-16MB: global GTT 178 - */ 179 - struct xe_mmio mmio; 180 - 181 - /** @mem: memory management info for tile */ 182 - struct { 183 - /** 184 - * @mem.kernel_vram: kernel-dedicated VRAM info for tile. 185 - * 186 - * Although VRAM is associated with a specific tile, it can 187 - * still be accessed by all tiles' GTs. 188 - */ 189 - struct xe_vram_region *kernel_vram; 190 - 191 - /** 192 - * @mem.vram: general purpose VRAM info for tile. 193 - * 194 - * Although VRAM is associated with a specific tile, it can 195 - * still be accessed by all tiles' GTs. 196 - */ 197 - struct xe_vram_region *vram; 198 - 199 - /** @mem.ggtt: Global graphics translation table */ 200 - struct xe_ggtt *ggtt; 201 - 202 - /** 203 - * @mem.kernel_bb_pool: Pool from which batchbuffers are allocated. 204 - * 205 - * Media GT shares a pool with its primary GT. 206 - */ 207 - struct xe_sa_manager *kernel_bb_pool; 208 - 209 - /** 210 - * @mem.reclaim_pool: Pool for PRLs allocated. 211 - * 212 - * Only main GT has page reclaim list allocations. 213 - */ 214 - struct xe_sa_manager *reclaim_pool; 215 - } mem; 216 - 217 - /** @sriov: tile level virtualization data */ 218 - union { 219 - struct { 220 - /** @sriov.pf.lmtt: Local Memory Translation Table. */ 221 - struct xe_lmtt lmtt; 222 - } pf; 223 - struct { 224 - /** @sriov.vf.ggtt_balloon: GGTT regions excluded from use. */ 225 - struct xe_ggtt_node *ggtt_balloon[2]; 226 - /** @sriov.vf.self_config: VF configuration data */ 227 - struct xe_tile_sriov_vf_selfconfig self_config; 228 - } vf; 229 - } sriov; 230 - 231 - /** @memirq: Memory Based Interrupts. */ 232 - struct xe_memirq memirq; 233 - 234 - /** @csc_hw_error_work: worker to report CSC HW errors */ 235 - struct work_struct csc_hw_error_work; 236 - 237 - /** @pcode: tile's PCODE */ 238 - struct { 239 - /** @pcode.lock: protecting tile's PCODE mailbox data */ 240 - struct mutex lock; 241 - } pcode; 242 - 243 - /** @migrate: Migration helper for vram blits and clearing */ 244 - struct xe_migrate *migrate; 245 - 246 - /** @sysfs: sysfs' kobj used by xe_tile_sysfs */ 247 - struct kobject *sysfs; 248 - 249 - /** @debugfs: debugfs directory associated with this tile */ 250 - struct dentry *debugfs; 251 - 252 - /** @mert: MERT-related data */ 253 - struct xe_mert mert; 254 - }; 255 93 256 94 /** 257 95 * struct xe_device - Top level struct of Xe device ··· 138 300 u8 tile_count; 139 301 /** @info.max_gt_per_tile: Number of GT IDs allocated to each tile */ 140 302 u8 max_gt_per_tile; 303 + /** @info.multi_lrc_mask: bitmask of engine classes which support multi-lrc */ 304 + u8 multi_lrc_mask; 141 305 /** @info.gt_count: Total number of GTs for entire device */ 142 306 u8 gt_count; 143 307 /** @info.vm_max_level: Max VM level */ ··· 193 353 u8 has_pre_prod_wa:1; 194 354 /** @info.has_pxp: Device has PXP support */ 195 355 u8 has_pxp:1; 356 + /** @info.has_ctx_tlb_inval: Has context based TLB invalidations */ 357 + u8 has_ctx_tlb_inval:1; 196 358 /** @info.has_range_tlb_inval: Has range based TLB invalidations */ 197 359 u8 has_range_tlb_inval:1; 198 360 /** @info.has_soc_remapper_sysctrl: Has SoC remapper system controller */ ··· 401 559 const struct xe_pat_table_entry *table; 402 560 /** @pat.n_entries: Number of PAT entries */ 403 561 int n_entries; 404 - /** @pat.ats_entry: PAT entry for PCIe ATS responses */ 562 + /** @pat.pat_ats: PAT entry for PCIe ATS responses */ 405 563 const struct xe_pat_table_entry *pat_ats; 406 - /** @pat.pta_entry: PAT entry for page table accesses */ 407 - const struct xe_pat_table_entry *pat_pta; 564 + /** @pat.pat_primary_pta: primary GT PAT entry for page table accesses */ 565 + const struct xe_pat_table_entry *pat_primary_pta; 566 + /** @pat.pat_media_pta: media GT PAT entry for page table accesses */ 567 + const struct xe_pat_table_entry *pat_media_pta; 408 568 u32 idx[__XE_CACHE_LEVEL_COUNT]; 409 569 } pat; 410 570
+96 -40
drivers/gpu/drm/xe/xe_exec_queue.c
··· 152 152 if (xe_exec_queue_is_multi_queue(q)) 153 153 xe_exec_queue_group_cleanup(q); 154 154 155 - if (q->vm) 155 + if (q->vm) { 156 + xe_vm_remove_exec_queue(q->vm, q); 156 157 xe_vm_put(q->vm); 158 + } 157 159 158 160 if (q->xef) 159 161 xe_file_put(q->xef); ··· 226 224 q->ring_ops = gt->ring_ops[hwe->class]; 227 225 q->ops = gt->exec_queue_ops; 228 226 INIT_LIST_HEAD(&q->lr.link); 227 + INIT_LIST_HEAD(&q->vm_exec_queue_link); 229 228 INIT_LIST_HEAD(&q->multi_gt_link); 230 229 INIT_LIST_HEAD(&q->hw_engine_group_link); 231 230 INIT_LIST_HEAD(&q->pxp.link); 231 + spin_lock_init(&q->multi_queue.lock); 232 + spin_lock_init(&q->lrc_lookup_lock); 232 233 q->multi_queue.priority = XE_MULTI_QUEUE_PRIORITY_NORMAL; 233 234 234 235 q->sched_props.timeslice_us = hwe->eclass->sched_props.timeslice_us; ··· 271 266 return q; 272 267 } 273 268 269 + static void xe_exec_queue_set_lrc(struct xe_exec_queue *q, struct xe_lrc *lrc, u16 idx) 270 + { 271 + xe_assert(gt_to_xe(q->gt), idx < q->width); 272 + 273 + scoped_guard(spinlock, &q->lrc_lookup_lock) 274 + q->lrc[idx] = lrc; 275 + } 276 + 277 + /** 278 + * xe_exec_queue_get_lrc() - Get the LRC from exec queue. 279 + * @q: The exec queue instance. 280 + * @idx: Index within multi-LRC array. 281 + * 282 + * Retrieves LRC of given index for the exec queue under lock 283 + * and takes reference. 284 + * 285 + * Return: Pointer to LRC on success, error on failure, NULL on 286 + * lookup failure. 287 + */ 288 + struct xe_lrc *xe_exec_queue_get_lrc(struct xe_exec_queue *q, u16 idx) 289 + { 290 + struct xe_lrc *lrc; 291 + 292 + xe_assert(gt_to_xe(q->gt), idx < q->width); 293 + 294 + scoped_guard(spinlock, &q->lrc_lookup_lock) { 295 + lrc = q->lrc[idx]; 296 + if (lrc) 297 + xe_lrc_get(lrc); 298 + } 299 + 300 + return lrc; 301 + } 302 + 303 + /** 304 + * xe_exec_queue_lrc() - Get the LRC from exec queue. 305 + * @q: The exec queue instance. 306 + * 307 + * Retrieves the primary LRC for the exec queue. Note that this function 308 + * returns only the first LRC instance, even when multiple parallel LRCs 309 + * are configured. This function does not increment reference count, 310 + * so the reference can be just forgotten after use. 311 + * 312 + * Return: Pointer to LRC on success, error on failure 313 + */ 314 + struct xe_lrc *xe_exec_queue_lrc(struct xe_exec_queue *q) 315 + { 316 + return q->lrc[0]; 317 + } 318 + 319 + static void __xe_exec_queue_fini(struct xe_exec_queue *q) 320 + { 321 + int i; 322 + 323 + q->ops->fini(q); 324 + 325 + for (i = 0; i < q->width; ++i) 326 + xe_lrc_put(q->lrc[i]); 327 + } 328 + 274 329 static int __xe_exec_queue_init(struct xe_exec_queue *q, u32 exec_queue_flags) 275 330 { 276 331 int i, err; ··· 368 303 * from the moment vCPU resumes execution. 369 304 */ 370 305 for (i = 0; i < q->width; ++i) { 371 - struct xe_lrc *lrc; 306 + struct xe_lrc *__lrc = NULL; 307 + int marker; 372 308 373 - xe_gt_sriov_vf_wait_valid_ggtt(q->gt); 374 - lrc = xe_lrc_create(q->hwe, q->vm, q->replay_state, 375 - xe_lrc_ring_size(), q->msix_vec, flags); 376 - if (IS_ERR(lrc)) { 377 - err = PTR_ERR(lrc); 378 - goto err_lrc; 379 - } 309 + do { 310 + struct xe_lrc *lrc; 380 311 381 - /* Pairs with READ_ONCE to xe_exec_queue_contexts_hwsp_rebase */ 382 - WRITE_ONCE(q->lrc[i], lrc); 312 + marker = xe_gt_sriov_vf_wait_valid_ggtt(q->gt); 313 + 314 + lrc = xe_lrc_create(q->hwe, q->vm, q->replay_state, 315 + xe_lrc_ring_size(), q->msix_vec, flags); 316 + if (IS_ERR(lrc)) { 317 + err = PTR_ERR(lrc); 318 + goto err_lrc; 319 + } 320 + 321 + xe_exec_queue_set_lrc(q, lrc, i); 322 + 323 + if (__lrc) 324 + xe_lrc_put(__lrc); 325 + __lrc = lrc; 326 + 327 + } while (marker != xe_vf_migration_fixups_complete_count(q->gt)); 383 328 } 384 329 385 330 return 0; 386 331 387 332 err_lrc: 388 - for (i = i - 1; i >= 0; --i) 389 - xe_lrc_put(q->lrc[i]); 333 + __xe_exec_queue_fini(q); 390 334 return err; 391 - } 392 - 393 - static void __xe_exec_queue_fini(struct xe_exec_queue *q) 394 - { 395 - int i; 396 - 397 - q->ops->fini(q); 398 - 399 - for (i = 0; i < q->width; ++i) 400 - xe_lrc_put(q->lrc[i]); 401 335 } 402 336 403 337 struct xe_exec_queue *xe_exec_queue_create(struct xe_device *xe, struct xe_vm *vm, ··· 1244 1180 if (XE_IOCTL_DBG(xe, !hwe)) 1245 1181 return -EINVAL; 1246 1182 1183 + /* multi-lrc is only supported on select engine classes */ 1184 + if (XE_IOCTL_DBG(xe, args->width > 1 && 1185 + !(xe->info.multi_lrc_mask & BIT(hwe->class)))) 1186 + return -EOPNOTSUPP; 1187 + 1247 1188 vm = xe_vm_lookup(xef, args->vm_id); 1248 1189 if (XE_IOCTL_DBG(xe, !vm)) 1249 1190 return -ENOENT; ··· 1302 1233 } 1303 1234 1304 1235 q->xef = xe_file_get(xef); 1236 + if (eci[0].engine_class != DRM_XE_ENGINE_CLASS_VM_BIND) 1237 + xe_vm_add_exec_queue(vm, q); 1305 1238 1306 1239 /* user id alloc must always be last in ioctl to prevent UAF */ 1307 1240 err = xa_alloc(&xef->exec_queue.xa, &id, q, xa_limit_32b, GFP_KERNEL); ··· 1352 1281 xe_exec_queue_put(q); 1353 1282 1354 1283 return ret; 1355 - } 1356 - 1357 - /** 1358 - * xe_exec_queue_lrc() - Get the LRC from exec queue. 1359 - * @q: The exec_queue. 1360 - * 1361 - * Retrieves the primary LRC for the exec queue. Note that this function 1362 - * returns only the first LRC instance, even when multiple parallel LRCs 1363 - * are configured. 1364 - * 1365 - * Return: Pointer to LRC on success, error on failure 1366 - */ 1367 - struct xe_lrc *xe_exec_queue_lrc(struct xe_exec_queue *q) 1368 - { 1369 - return q->lrc[0]; 1370 1284 } 1371 1285 1372 1286 /** ··· 1713 1657 for (i = 0; i < q->width; ++i) { 1714 1658 struct xe_lrc *lrc; 1715 1659 1716 - /* Pairs with WRITE_ONCE in __xe_exec_queue_init */ 1717 - lrc = READ_ONCE(q->lrc[i]); 1660 + lrc = xe_exec_queue_get_lrc(q, i); 1718 1661 if (!lrc) 1719 1662 continue; 1720 1663 1721 1664 xe_lrc_update_memirq_regs_with_address(lrc, q->hwe, scratch); 1722 1665 xe_lrc_update_hwctx_regs_with_address(lrc); 1723 1666 err = xe_lrc_setup_wa_bb_with_scratch(lrc, q->hwe, scratch); 1667 + xe_lrc_put(lrc); 1724 1668 if (err) 1725 1669 break; 1726 1670 }
+1
drivers/gpu/drm/xe/xe_exec_queue.h
··· 160 160 int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q, void *scratch); 161 161 162 162 struct xe_lrc *xe_exec_queue_lrc(struct xe_exec_queue *q); 163 + struct xe_lrc *xe_exec_queue_get_lrc(struct xe_exec_queue *q, u16 idx); 163 164 164 165 /** 165 166 * xe_exec_queue_idle_skip_suspend() - Can exec queue skip suspend
+18 -1
drivers/gpu/drm/xe/xe_exec_queue_types.h
··· 66 66 bool sync_pending; 67 67 /** @banned: Group banned */ 68 68 bool banned; 69 + /** @stopped: Group is stopped, protected by list_lock */ 70 + bool stopped; 69 71 }; 70 72 71 73 /** ··· 161 159 struct xe_exec_queue_group *group; 162 160 /** @multi_queue.link: Link into group's secondary queues list */ 163 161 struct list_head link; 164 - /** @multi_queue.priority: Queue priority within the multi-queue group */ 162 + /** 163 + * @multi_queue.priority: Queue priority within the multi-queue group. 164 + * It is protected by @multi_queue.lock. 165 + */ 165 166 enum xe_multi_queue_priority priority; 167 + /** @multi_queue.lock: Lock for protecting certain members */ 168 + spinlock_t lock; 166 169 /** @multi_queue.pos: Position of queue within the multi-queue group */ 167 170 u8 pos; 168 171 /** @multi_queue.valid: Queue belongs to a multi queue group */ ··· 218 211 struct dma_fence *last_fence; 219 212 } tlb_inval[XE_EXEC_QUEUE_TLB_INVAL_COUNT]; 220 213 214 + /** @vm_exec_queue_link: Link to track exec queue within a VM's list of exec queues. */ 215 + struct list_head vm_exec_queue_link; 216 + 221 217 /** @pxp: PXP info tracking */ 222 218 struct { 223 219 /** @pxp.type: PXP session type used by this queue */ ··· 257 247 u64 tlb_flush_seqno; 258 248 /** @hw_engine_group_link: link into exec queues in the same hw engine group */ 259 249 struct list_head hw_engine_group_link; 250 + /** 251 + * @lrc_lookup_lock: Lock for protecting lrc array access. Only used when 252 + * running in parallel to queue creation is possible. 253 + */ 254 + spinlock_t lrc_lookup_lock; 260 255 /** @lrc: logical ring context for this exec queue */ 261 256 struct xe_lrc *lrc[] __counted_by(width); 262 257 }; ··· 316 301 void (*resume)(struct xe_exec_queue *q); 317 302 /** @reset_status: check exec queue reset status */ 318 303 bool (*reset_status)(struct xe_exec_queue *q); 304 + /** @active: check exec queue is active */ 305 + bool (*active)(struct xe_exec_queue *q); 319 306 }; 320 307 321 308 #endif
+8 -1
drivers/gpu/drm/xe/xe_execlist.c
··· 421 421 static void execlist_exec_queue_destroy(struct xe_exec_queue *q) 422 422 { 423 423 INIT_WORK(&q->execlist->destroy_async, execlist_exec_queue_destroy_async); 424 - queue_work(system_unbound_wq, &q->execlist->destroy_async); 424 + queue_work(system_dfl_wq, &q->execlist->destroy_async); 425 425 } 426 426 427 427 static int execlist_exec_queue_set_priority(struct xe_exec_queue *q, ··· 468 468 return false; 469 469 } 470 470 471 + static bool execlist_exec_queue_active(struct xe_exec_queue *q) 472 + { 473 + /* NIY */ 474 + return false; 475 + } 476 + 471 477 static const struct xe_exec_queue_ops execlist_exec_queue_ops = { 472 478 .init = execlist_exec_queue_init, 473 479 .kill = execlist_exec_queue_kill, ··· 486 480 .suspend_wait = execlist_exec_queue_suspend_wait, 487 481 .resume = execlist_exec_queue_resume, 488 482 .reset_status = execlist_exec_queue_reset_status, 483 + .active = execlist_exec_queue_active, 489 484 }; 490 485 491 486 int xe_execlist_init(struct xe_gt *gt)
+40 -6
drivers/gpu/drm/xe/xe_force_wake.c
··· 148 148 return __domain_wait(gt, domain, false); 149 149 } 150 150 151 - #define for_each_fw_domain_masked(domain__, mask__, fw__, tmp__) \ 152 - for (tmp__ = (mask__); tmp__; tmp__ &= ~BIT(ffs(tmp__) - 1)) \ 153 - for_each_if((domain__ = ((fw__)->domains + \ 154 - (ffs(tmp__) - 1))) && \ 155 - domain__->reg_ctl.addr) 156 - 157 151 /** 158 152 * xe_force_wake_get() : Increase the domain refcount 159 153 * @fw: struct xe_force_wake ··· 259 265 260 266 xe_gt_WARN(gt, ack_fail, "Forcewake domain%s %#x failed to acknowledge sleep request\n", 261 267 str_plural(hweight_long(ack_fail)), ack_fail); 268 + } 269 + 270 + const char *xe_force_wake_domain_to_str(enum xe_force_wake_domain_id id) 271 + { 272 + switch (id) { 273 + case XE_FW_DOMAIN_ID_GT: 274 + return "GT"; 275 + case XE_FW_DOMAIN_ID_RENDER: 276 + return "Render"; 277 + case XE_FW_DOMAIN_ID_MEDIA: 278 + return "Media"; 279 + case XE_FW_DOMAIN_ID_MEDIA_VDBOX0: 280 + return "VDBox0"; 281 + case XE_FW_DOMAIN_ID_MEDIA_VDBOX1: 282 + return "VDBox1"; 283 + case XE_FW_DOMAIN_ID_MEDIA_VDBOX2: 284 + return "VDBox2"; 285 + case XE_FW_DOMAIN_ID_MEDIA_VDBOX3: 286 + return "VDBox3"; 287 + case XE_FW_DOMAIN_ID_MEDIA_VDBOX4: 288 + return "VDBox4"; 289 + case XE_FW_DOMAIN_ID_MEDIA_VDBOX5: 290 + return "VDBox5"; 291 + case XE_FW_DOMAIN_ID_MEDIA_VDBOX6: 292 + return "VDBox6"; 293 + case XE_FW_DOMAIN_ID_MEDIA_VDBOX7: 294 + return "VDBox7"; 295 + case XE_FW_DOMAIN_ID_MEDIA_VEBOX0: 296 + return "VEBox0"; 297 + case XE_FW_DOMAIN_ID_MEDIA_VEBOX1: 298 + return "VEBox1"; 299 + case XE_FW_DOMAIN_ID_MEDIA_VEBOX2: 300 + return "VEBox2"; 301 + case XE_FW_DOMAIN_ID_MEDIA_VEBOX3: 302 + return "VEBox3"; 303 + case XE_FW_DOMAIN_ID_GSC: 304 + return "GSC"; 305 + default: 306 + return "Unknown"; 307 + } 262 308 }
+11
drivers/gpu/drm/xe/xe_force_wake.h
··· 19 19 enum xe_force_wake_domains domains); 20 20 void xe_force_wake_put(struct xe_force_wake *fw, unsigned int fw_ref); 21 21 22 + const char *xe_force_wake_domain_to_str(enum xe_force_wake_domain_id id); 23 + 24 + #define for_each_fw_domain_masked(domain__, mask__, fw__, tmp__) \ 25 + for (tmp__ = (mask__); tmp__; tmp__ &= ~BIT(ffs(tmp__) - 1)) \ 26 + for_each_if(((domain__) = ((fw__)->domains + \ 27 + (ffs(tmp__) - 1))) && \ 28 + (domain__)->reg_ctl.addr) 29 + 30 + #define for_each_fw_domain(domain__, fw__, tmp__) \ 31 + for_each_fw_domain_masked((domain__), (fw__)->initialized_domains, (fw__), (tmp__)) 32 + 22 33 static inline int 23 34 xe_force_wake_ref(struct xe_force_wake *fw, 24 35 enum xe_force_wake_domains domain)
+159 -216
drivers/gpu/drm/xe/xe_ggtt.c
··· 69 69 /** 70 70 * struct xe_ggtt_node - A node in GGTT. 71 71 * 72 - * This struct needs to be initialized (only-once) with xe_ggtt_node_init() before any node 73 - * insertion, reservation, or 'ballooning'. 74 - * It will, then, be finalized by either xe_ggtt_node_remove() or xe_ggtt_node_deballoon(). 72 + * This struct is allocated with xe_ggtt_insert_node(,_transform) or xe_ggtt_insert_bo(,_at). 73 + * It will be deallocated using xe_ggtt_node_remove(). 75 74 */ 76 75 struct xe_ggtt_node { 77 76 /** @ggtt: Back pointer to xe_ggtt where this region will be inserted at */ ··· 81 82 struct work_struct delayed_removal_work; 82 83 /** @invalidate_on_remove: If it needs invalidation upon removal */ 83 84 bool invalidate_on_remove; 85 + }; 86 + 87 + /** 88 + * struct xe_ggtt_pt_ops - GGTT Page table operations 89 + * Which can vary from platform to platform. 90 + */ 91 + struct xe_ggtt_pt_ops { 92 + /** @pte_encode_flags: Encode PTE flags for a given BO */ 93 + u64 (*pte_encode_flags)(struct xe_bo *bo, u16 pat_index); 94 + 95 + /** @ggtt_set_pte: Directly write into GGTT's PTE */ 96 + xe_ggtt_set_pte_fn ggtt_set_pte; 97 + 98 + /** @ggtt_get_pte: Directly read from GGTT's PTE */ 99 + u64 (*ggtt_get_pte)(struct xe_ggtt *ggtt, u64 addr); 100 + }; 101 + 102 + /** 103 + * struct xe_ggtt - Main GGTT struct 104 + * 105 + * In general, each tile can contains its own Global Graphics Translation Table 106 + * (GGTT) instance. 107 + */ 108 + struct xe_ggtt { 109 + /** @tile: Back pointer to tile where this GGTT belongs */ 110 + struct xe_tile *tile; 111 + /** @start: Start offset of GGTT */ 112 + u64 start; 113 + /** @size: Total usable size of this GGTT */ 114 + u64 size; 115 + 116 + #define XE_GGTT_FLAGS_64K BIT(0) 117 + /** 118 + * @flags: Flags for this GGTT 119 + * Acceptable flags: 120 + * - %XE_GGTT_FLAGS_64K - if PTE size is 64K. Otherwise, regular is 4K. 121 + */ 122 + unsigned int flags; 123 + /** @scratch: Internal object allocation used as a scratch page */ 124 + struct xe_bo *scratch; 125 + /** @lock: Mutex lock to protect GGTT data */ 126 + struct mutex lock; 127 + /** 128 + * @gsm: The iomem pointer to the actual location of the translation 129 + * table located in the GSM for easy PTE manipulation 130 + */ 131 + u64 __iomem *gsm; 132 + /** @pt_ops: Page Table operations per platform */ 133 + const struct xe_ggtt_pt_ops *pt_ops; 134 + /** @mm: The memory manager used to manage individual GGTT allocations */ 135 + struct drm_mm mm; 136 + /** @access_count: counts GGTT writes */ 137 + unsigned int access_count; 138 + /** @wq: Dedicated unordered work queue to process node removals */ 139 + struct workqueue_struct *wq; 84 140 }; 85 141 86 142 static u64 xelp_ggtt_pte_flags(struct xe_bo *bo, u16 pat_index) ··· 247 193 static u64 xe_ggtt_get_pte(struct xe_ggtt *ggtt, u64 addr) 248 194 { 249 195 xe_tile_assert(ggtt->tile, !(addr & XE_PTE_MASK)); 250 - xe_tile_assert(ggtt->tile, addr < ggtt->size); 196 + xe_tile_assert(ggtt->tile, addr < ggtt->start + ggtt->size); 251 197 252 198 return readq(&ggtt->gsm[addr >> XE_PTE_SHIFT]); 253 199 } ··· 353 299 { 354 300 ggtt->start = start; 355 301 ggtt->size = size; 356 - drm_mm_init(&ggtt->mm, start, size); 302 + drm_mm_init(&ggtt->mm, 0, size); 357 303 } 358 304 359 305 int xe_ggtt_init_kunit(struct xe_ggtt *ggtt, u32 start, u32 size) ··· 401 347 ggtt_start = wopcm; 402 348 ggtt_size = (gsm_size / 8) * (u64)XE_PAGE_SIZE - ggtt_start; 403 349 } else { 404 - /* GGTT is expected to be 4GiB */ 405 - ggtt_start = wopcm; 406 - ggtt_size = SZ_4G - ggtt_start; 350 + ggtt_start = xe_tile_sriov_vf_ggtt_base(ggtt->tile); 351 + ggtt_size = xe_tile_sriov_vf_ggtt(ggtt->tile); 352 + 353 + if (ggtt_start < wopcm || 354 + ggtt_start + ggtt_size > GUC_GGTT_TOP) { 355 + xe_tile_err(ggtt->tile, "Invalid GGTT configuration: %#llx-%#llx\n", 356 + ggtt_start, ggtt_start + ggtt_size - 1); 357 + return -ERANGE; 358 + } 407 359 } 408 360 409 361 ggtt->gsm = ggtt->tile->mmio.regs + SZ_8M; ··· 427 367 else 428 368 ggtt->pt_ops = &xelp_pt_ops; 429 369 430 - ggtt->wq = alloc_workqueue("xe-ggtt-wq", WQ_MEM_RECLAIM, 0); 370 + ggtt->wq = alloc_workqueue("xe-ggtt-wq", WQ_MEM_RECLAIM | WQ_PERCPU, 0); 431 371 if (!ggtt->wq) 432 372 return -ENOMEM; 433 373 ··· 437 377 if (err) 438 378 return err; 439 379 440 - err = devm_add_action_or_reset(xe->drm.dev, dev_fini_ggtt, ggtt); 441 - if (err) 442 - return err; 443 - 444 - if (IS_SRIOV_VF(xe)) { 445 - err = xe_tile_sriov_vf_prepare_ggtt(ggtt->tile); 446 - if (err) 447 - return err; 448 - } 449 - 450 - return 0; 380 + return devm_add_action_or_reset(xe->drm.dev, dev_fini_ggtt, ggtt); 451 381 } 452 382 ALLOW_ERROR_INJECTION(xe_ggtt_init_early, ERRNO); /* See xe_pci_probe() */ 453 383 ··· 451 401 /* Display may have allocated inside ggtt, so be careful with clearing here */ 452 402 mutex_lock(&ggtt->lock); 453 403 drm_mm_for_each_hole(hole, &ggtt->mm, start, end) 454 - xe_ggtt_clear(ggtt, start, end - start); 404 + xe_ggtt_clear(ggtt, ggtt->start + start, end - start); 455 405 456 406 xe_ggtt_invalidate(ggtt); 457 407 mutex_unlock(&ggtt->lock); 408 + } 409 + 410 + static void ggtt_node_fini(struct xe_ggtt_node *node) 411 + { 412 + kfree(node); 458 413 } 459 414 460 415 static void ggtt_node_remove(struct xe_ggtt_node *node) ··· 473 418 474 419 mutex_lock(&ggtt->lock); 475 420 if (bound) 476 - xe_ggtt_clear(ggtt, node->base.start, node->base.size); 421 + xe_ggtt_clear(ggtt, xe_ggtt_node_addr(node), xe_ggtt_node_size(node)); 477 422 drm_mm_remove_node(&node->base); 478 423 node->base.size = 0; 479 424 mutex_unlock(&ggtt->lock); ··· 487 432 drm_dev_exit(idx); 488 433 489 434 free_node: 490 - xe_ggtt_node_fini(node); 435 + ggtt_node_fini(node); 491 436 } 492 437 493 438 static void ggtt_node_remove_work_func(struct work_struct *work) ··· 593 538 ggtt_invalidate_gt_tlb(ggtt->tile->media_gt); 594 539 } 595 540 596 - static void xe_ggtt_dump_node(struct xe_ggtt *ggtt, 597 - const struct drm_mm_node *node, const char *description) 598 - { 599 - char buf[10]; 600 - 601 - if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) { 602 - string_get_size(node->size, 1, STRING_UNITS_2, buf, sizeof(buf)); 603 - xe_tile_dbg(ggtt->tile, "GGTT %#llx-%#llx (%s) %s\n", 604 - node->start, node->start + node->size, buf, description); 605 - } 606 - } 607 - 608 541 /** 609 - * xe_ggtt_node_insert_balloon_locked - prevent allocation of specified GGTT addresses 610 - * @node: the &xe_ggtt_node to hold reserved GGTT node 611 - * @start: the starting GGTT address of the reserved region 612 - * @end: then end GGTT address of the reserved region 613 - * 614 - * To be used in cases where ggtt->lock is already taken. 615 - * Use xe_ggtt_node_remove_balloon_locked() to release a reserved GGTT node. 616 - * 617 - * Return: 0 on success or a negative error code on failure. 618 - */ 619 - int xe_ggtt_node_insert_balloon_locked(struct xe_ggtt_node *node, u64 start, u64 end) 620 - { 621 - struct xe_ggtt *ggtt = node->ggtt; 622 - int err; 623 - 624 - xe_tile_assert(ggtt->tile, start < end); 625 - xe_tile_assert(ggtt->tile, IS_ALIGNED(start, XE_PAGE_SIZE)); 626 - xe_tile_assert(ggtt->tile, IS_ALIGNED(end, XE_PAGE_SIZE)); 627 - xe_tile_assert(ggtt->tile, !drm_mm_node_allocated(&node->base)); 628 - lockdep_assert_held(&ggtt->lock); 629 - 630 - node->base.color = 0; 631 - node->base.start = start; 632 - node->base.size = end - start; 633 - 634 - err = drm_mm_reserve_node(&ggtt->mm, &node->base); 635 - 636 - if (xe_tile_WARN(ggtt->tile, err, "Failed to balloon GGTT %#llx-%#llx (%pe)\n", 637 - node->base.start, node->base.start + node->base.size, ERR_PTR(err))) 638 - return err; 639 - 640 - xe_ggtt_dump_node(ggtt, &node->base, "balloon"); 641 - return 0; 642 - } 643 - 644 - /** 645 - * xe_ggtt_node_remove_balloon_locked - release a reserved GGTT region 646 - * @node: the &xe_ggtt_node with reserved GGTT region 647 - * 648 - * To be used in cases where ggtt->lock is already taken. 649 - * See xe_ggtt_node_insert_balloon_locked() for details. 650 - */ 651 - void xe_ggtt_node_remove_balloon_locked(struct xe_ggtt_node *node) 652 - { 653 - if (!xe_ggtt_node_allocated(node)) 654 - return; 655 - 656 - lockdep_assert_held(&node->ggtt->lock); 657 - 658 - xe_ggtt_dump_node(node->ggtt, &node->base, "remove-balloon"); 659 - 660 - drm_mm_remove_node(&node->base); 661 - } 662 - 663 - static void xe_ggtt_assert_fit(struct xe_ggtt *ggtt, u64 start, u64 size) 664 - { 665 - struct xe_tile *tile = ggtt->tile; 666 - 667 - xe_tile_assert(tile, start >= ggtt->start); 668 - xe_tile_assert(tile, start + size <= ggtt->start + ggtt->size); 669 - } 670 - 671 - /** 672 - * xe_ggtt_shift_nodes_locked - Shift GGTT nodes to adjust for a change in usable address range. 542 + * xe_ggtt_shift_nodes() - Shift GGTT nodes to adjust for a change in usable address range. 673 543 * @ggtt: the &xe_ggtt struct instance 674 - * @shift: change to the location of area provisioned for current VF 544 + * @new_start: new location of area provisioned for current VF 675 545 * 676 - * This function moves all nodes from the GGTT VM, to a temp list. These nodes are expected 677 - * to represent allocations in range formerly assigned to current VF, before the range changed. 678 - * When the GGTT VM is completely clear of any nodes, they are re-added with shifted offsets. 546 + * Ensure that all struct &xe_ggtt_node are moved to the @new_start base address 547 + * by changing the base offset of the GGTT. 679 548 * 680 - * The function has no ability of failing - because it shifts existing nodes, without 681 - * any additional processing. If the nodes were successfully existing at the old address, 682 - * they will do the same at the new one. A fail inside this function would indicate that 683 - * the list of nodes was either already damaged, or that the shift brings the address range 684 - * outside of valid bounds. Both cases justify an assert rather than error code. 549 + * This function may be called multiple times during recovery, but if 550 + * @new_start is unchanged from the current base, it's a noop. 551 + * 552 + * @new_start should be a value between xe_wopcm_size() and #GUC_GGTT_TOP. 685 553 */ 686 - void xe_ggtt_shift_nodes_locked(struct xe_ggtt *ggtt, s64 shift) 554 + void xe_ggtt_shift_nodes(struct xe_ggtt *ggtt, u64 new_start) 687 555 { 688 - struct xe_tile *tile __maybe_unused = ggtt->tile; 689 - struct drm_mm_node *node, *tmpn; 690 - LIST_HEAD(temp_list_head); 556 + guard(mutex)(&ggtt->lock); 691 557 692 - lockdep_assert_held(&ggtt->lock); 558 + xe_tile_assert(ggtt->tile, new_start >= xe_wopcm_size(tile_to_xe(ggtt->tile))); 559 + xe_tile_assert(ggtt->tile, new_start + ggtt->size <= GUC_GGTT_TOP); 693 560 694 - if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) 695 - drm_mm_for_each_node_safe(node, tmpn, &ggtt->mm) 696 - xe_ggtt_assert_fit(ggtt, node->start + shift, node->size); 697 - 698 - drm_mm_for_each_node_safe(node, tmpn, &ggtt->mm) { 699 - drm_mm_remove_node(node); 700 - list_add(&node->node_list, &temp_list_head); 701 - } 702 - 703 - list_for_each_entry_safe(node, tmpn, &temp_list_head, node_list) { 704 - list_del(&node->node_list); 705 - node->start += shift; 706 - drm_mm_reserve_node(&ggtt->mm, node); 707 - xe_tile_assert(tile, drm_mm_node_allocated(node)); 708 - } 561 + /* pairs with READ_ONCE in xe_ggtt_node_addr() */ 562 + WRITE_ONCE(ggtt->start, new_start); 709 563 } 710 564 711 - static int xe_ggtt_node_insert_locked(struct xe_ggtt_node *node, 565 + static int xe_ggtt_insert_node_locked(struct xe_ggtt_node *node, 712 566 u32 size, u32 align, u32 mm_flags) 713 567 { 714 568 return drm_mm_insert_node_generic(&node->ggtt->mm, &node->base, size, align, 0, 715 569 mm_flags); 716 570 } 717 571 718 - /** 719 - * xe_ggtt_node_insert - Insert a &xe_ggtt_node into the GGTT 720 - * @node: the &xe_ggtt_node to be inserted 721 - * @size: size of the node 722 - * @align: alignment constrain of the node 723 - * 724 - * It cannot be called without first having called xe_ggtt_init() once. 725 - * 726 - * Return: 0 on success or a negative error code on failure. 727 - */ 728 - int xe_ggtt_node_insert(struct xe_ggtt_node *node, u32 size, u32 align) 729 - { 730 - int ret; 731 - 732 - if (!node || !node->ggtt) 733 - return -ENOENT; 734 - 735 - mutex_lock(&node->ggtt->lock); 736 - ret = xe_ggtt_node_insert_locked(node, size, align, 737 - DRM_MM_INSERT_HIGH); 738 - mutex_unlock(&node->ggtt->lock); 739 - 740 - return ret; 741 - } 742 - 743 - /** 744 - * xe_ggtt_node_init - Initialize %xe_ggtt_node struct 745 - * @ggtt: the &xe_ggtt where the new node will later be inserted/reserved. 746 - * 747 - * This function will allocate the struct %xe_ggtt_node and return its pointer. 748 - * This struct will then be freed after the node removal upon xe_ggtt_node_remove() 749 - * or xe_ggtt_node_remove_balloon_locked(). 750 - * 751 - * Having %xe_ggtt_node struct allocated doesn't mean that the node is already 752 - * allocated in GGTT. Only xe_ggtt_node_insert(), allocation through 753 - * xe_ggtt_node_insert_transform(), or xe_ggtt_node_insert_balloon_locked() will ensure the node is inserted or reserved 754 - * in GGTT. 755 - * 756 - * Return: A pointer to %xe_ggtt_node struct on success. An ERR_PTR otherwise. 757 - **/ 758 - struct xe_ggtt_node *xe_ggtt_node_init(struct xe_ggtt *ggtt) 572 + static struct xe_ggtt_node *ggtt_node_init(struct xe_ggtt *ggtt) 759 573 { 760 574 struct xe_ggtt_node *node = kzalloc_obj(*node, GFP_NOFS); 761 575 ··· 638 714 } 639 715 640 716 /** 641 - * xe_ggtt_node_fini - Forcebly finalize %xe_ggtt_node struct 642 - * @node: the &xe_ggtt_node to be freed 717 + * xe_ggtt_insert_node - Insert a &xe_ggtt_node into the GGTT 718 + * @ggtt: the &xe_ggtt into which the node should be inserted. 719 + * @size: size of the node 720 + * @align: alignment constrain of the node 643 721 * 644 - * If anything went wrong with either xe_ggtt_node_insert(), xe_ggtt_node_insert_locked(), 645 - * or xe_ggtt_node_insert_balloon_locked(); and this @node is not going to be reused, then, 646 - * this function needs to be called to free the %xe_ggtt_node struct 647 - **/ 648 - void xe_ggtt_node_fini(struct xe_ggtt_node *node) 649 - { 650 - kfree(node); 651 - } 652 - 653 - /** 654 - * xe_ggtt_node_allocated - Check if node is allocated in GGTT 655 - * @node: the &xe_ggtt_node to be inspected 656 - * 657 - * Return: True if allocated, False otherwise. 722 + * Return: &xe_ggtt_node on success or a ERR_PTR on failure. 658 723 */ 659 - bool xe_ggtt_node_allocated(const struct xe_ggtt_node *node) 724 + struct xe_ggtt_node *xe_ggtt_insert_node(struct xe_ggtt *ggtt, u32 size, u32 align) 660 725 { 661 - if (!node || !node->ggtt) 662 - return false; 726 + struct xe_ggtt_node *node; 727 + int ret; 663 728 664 - return drm_mm_node_allocated(&node->base); 729 + node = ggtt_node_init(ggtt); 730 + if (IS_ERR(node)) 731 + return node; 732 + 733 + guard(mutex)(&ggtt->lock); 734 + ret = xe_ggtt_insert_node_locked(node, size, align, 735 + DRM_MM_INSERT_HIGH); 736 + if (ret) { 737 + ggtt_node_fini(node); 738 + return ERR_PTR(ret); 739 + } 740 + 741 + return node; 665 742 } 666 743 667 744 /** ··· 695 770 if (XE_WARN_ON(!node)) 696 771 return; 697 772 698 - start = node->base.start; 773 + start = xe_ggtt_node_addr(node); 699 774 end = start + xe_bo_size(bo); 700 775 701 776 if (!xe_bo_is_vram(bo) && !xe_bo_is_stolen(bo)) { ··· 736 811 } 737 812 738 813 /** 739 - * xe_ggtt_node_insert_transform - Insert a newly allocated &xe_ggtt_node into the GGTT 814 + * xe_ggtt_insert_node_transform - Insert a newly allocated &xe_ggtt_node into the GGTT 740 815 * @ggtt: the &xe_ggtt where the node will inserted/reserved. 741 816 * @bo: The bo to be transformed 742 817 * @pte_flags: The extra GGTT flags to add to mapping. ··· 750 825 * 751 826 * Return: A pointer to %xe_ggtt_node struct on success. An ERR_PTR otherwise. 752 827 */ 753 - struct xe_ggtt_node *xe_ggtt_node_insert_transform(struct xe_ggtt *ggtt, 828 + struct xe_ggtt_node *xe_ggtt_insert_node_transform(struct xe_ggtt *ggtt, 754 829 struct xe_bo *bo, u64 pte_flags, 755 830 u64 size, u32 align, 756 831 xe_ggtt_transform_cb transform, void *arg) ··· 758 833 struct xe_ggtt_node *node; 759 834 int ret; 760 835 761 - node = xe_ggtt_node_init(ggtt); 836 + node = ggtt_node_init(ggtt); 762 837 if (IS_ERR(node)) 763 838 return ERR_CAST(node); 764 839 ··· 767 842 goto err; 768 843 } 769 844 770 - ret = xe_ggtt_node_insert_locked(node, size, align, 0); 845 + ret = xe_ggtt_insert_node_locked(node, size, align, 0); 771 846 if (ret) 772 847 goto err_unlock; 773 848 ··· 782 857 err_unlock: 783 858 mutex_unlock(&ggtt->lock); 784 859 err: 785 - xe_ggtt_node_fini(node); 860 + ggtt_node_fini(node); 786 861 return ERR_PTR(ret); 787 862 } 788 863 ··· 808 883 809 884 xe_pm_runtime_get_noresume(tile_to_xe(ggtt->tile)); 810 885 811 - bo->ggtt_node[tile_id] = xe_ggtt_node_init(ggtt); 886 + bo->ggtt_node[tile_id] = ggtt_node_init(ggtt); 812 887 if (IS_ERR(bo->ggtt_node[tile_id])) { 813 888 err = PTR_ERR(bo->ggtt_node[tile_id]); 814 889 bo->ggtt_node[tile_id] = NULL; ··· 816 891 } 817 892 818 893 mutex_lock(&ggtt->lock); 894 + /* 895 + * When inheriting the initial framebuffer, the framebuffer is 896 + * physically located at VRAM address 0, and usually at GGTT address 0 too. 897 + * 898 + * The display code will ask for a GGTT allocation between end of BO and 899 + * remainder of GGTT, unaware that the start is reserved by WOPCM. 900 + */ 901 + if (start >= ggtt->start) 902 + start -= ggtt->start; 903 + else 904 + start = 0; 905 + 906 + /* Should never happen, but since we handle start, fail graciously for end */ 907 + if (end >= ggtt->start) 908 + end -= ggtt->start; 909 + else 910 + end = 0; 911 + 912 + xe_tile_assert(ggtt->tile, end >= start + xe_bo_size(bo)); 913 + 819 914 err = drm_mm_insert_node_in_range(&ggtt->mm, &bo->ggtt_node[tile_id]->base, 820 915 xe_bo_size(bo), alignment, 0, start, end, 0); 821 916 if (err) { 822 - xe_ggtt_node_fini(bo->ggtt_node[tile_id]); 917 + ggtt_node_fini(bo->ggtt_node[tile_id]); 823 918 bo->ggtt_node[tile_id] = NULL; 824 919 } else { 825 920 u16 cache_mode = bo->flags & XE_BO_FLAG_NEEDS_UC ? XE_CACHE_NONE : XE_CACHE_WB; ··· 947 1002 return FIELD_PREP(GGTT_PTE_VFID, vfid) | XE_PAGE_PRESENT; 948 1003 } 949 1004 950 - static void xe_ggtt_assign_locked(struct xe_ggtt *ggtt, const struct drm_mm_node *node, u16 vfid) 1005 + static void xe_ggtt_assign_locked(const struct xe_ggtt_node *node, u16 vfid) 951 1006 { 952 - u64 start = node->start; 953 - u64 size = node->size; 1007 + struct xe_ggtt *ggtt = node->ggtt; 1008 + u64 start = xe_ggtt_node_addr(node); 1009 + u64 size = xe_ggtt_node_size(node); 954 1010 u64 end = start + size - 1; 955 1011 u64 pte = xe_encode_vfid_pte(vfid); 956 1012 957 1013 lockdep_assert_held(&ggtt->lock); 958 - 959 - if (!drm_mm_node_allocated(node)) 960 - return; 961 1014 962 1015 while (start < end) { 963 1016 ggtt->pt_ops->ggtt_set_pte(ggtt, start, pte); ··· 976 1033 */ 977 1034 void xe_ggtt_assign(const struct xe_ggtt_node *node, u16 vfid) 978 1035 { 979 - mutex_lock(&node->ggtt->lock); 980 - xe_ggtt_assign_locked(node->ggtt, &node->base, vfid); 981 - mutex_unlock(&node->ggtt->lock); 1036 + guard(mutex)(&node->ggtt->lock); 1037 + xe_ggtt_assign_locked(node, vfid); 982 1038 } 983 1039 984 1040 /** ··· 999 1057 if (!node) 1000 1058 return -ENOENT; 1001 1059 1002 - guard(mutex)(&node->ggtt->lock); 1060 + ggtt = node->ggtt; 1061 + guard(mutex)(&ggtt->lock); 1003 1062 1004 1063 if (xe_ggtt_node_pt_size(node) != size) 1005 1064 return -EINVAL; 1006 1065 1007 - ggtt = node->ggtt; 1008 - start = node->base.start; 1009 - end = start + node->base.size - 1; 1066 + start = xe_ggtt_node_addr(node); 1067 + end = start + xe_ggtt_node_size(node) - 1; 1010 1068 1011 1069 while (start < end) { 1012 1070 pte = ggtt->pt_ops->ggtt_get_pte(ggtt, start); ··· 1039 1097 if (!node) 1040 1098 return -ENOENT; 1041 1099 1042 - guard(mutex)(&node->ggtt->lock); 1100 + ggtt = node->ggtt; 1101 + guard(mutex)(&ggtt->lock); 1043 1102 1044 1103 if (xe_ggtt_node_pt_size(node) != size) 1045 1104 return -EINVAL; 1046 1105 1047 - ggtt = node->ggtt; 1048 - start = node->base.start; 1049 - end = start + node->base.size - 1; 1106 + start = xe_ggtt_node_addr(node); 1107 + end = start + xe_ggtt_node_size(node) - 1; 1050 1108 1051 1109 while (start < end) { 1052 1110 vfid_pte = u64_replace_bits(*buf++, vfid, GGTT_PTE_VFID); ··· 1153 1211 */ 1154 1212 u64 xe_ggtt_node_addr(const struct xe_ggtt_node *node) 1155 1213 { 1156 - return node->base.start; 1214 + /* pairs with WRITE_ONCE in xe_ggtt_shift_nodes() */ 1215 + return node->base.start + READ_ONCE(node->ggtt->start); 1157 1216 } 1158 1217 1159 1218 /**
+5 -9
drivers/gpu/drm/xe/xe_ggtt.h
··· 9 9 #include "xe_ggtt_types.h" 10 10 11 11 struct drm_printer; 12 + struct xe_bo; 12 13 struct xe_tile; 13 14 struct drm_exec; 14 15 ··· 18 17 int xe_ggtt_init_kunit(struct xe_ggtt *ggtt, u32 reserved, u32 size); 19 18 int xe_ggtt_init(struct xe_ggtt *ggtt); 20 19 21 - struct xe_ggtt_node *xe_ggtt_node_init(struct xe_ggtt *ggtt); 22 - void xe_ggtt_node_fini(struct xe_ggtt_node *node); 23 - int xe_ggtt_node_insert_balloon_locked(struct xe_ggtt_node *node, 24 - u64 start, u64 size); 25 - void xe_ggtt_node_remove_balloon_locked(struct xe_ggtt_node *node); 26 - void xe_ggtt_shift_nodes_locked(struct xe_ggtt *ggtt, s64 shift); 20 + void xe_ggtt_shift_nodes(struct xe_ggtt *ggtt, u64 new_base); 27 21 u64 xe_ggtt_start(struct xe_ggtt *ggtt); 28 22 u64 xe_ggtt_size(struct xe_ggtt *ggtt); 29 23 30 - int xe_ggtt_node_insert(struct xe_ggtt_node *node, u32 size, u32 align); 31 24 struct xe_ggtt_node * 32 - xe_ggtt_node_insert_transform(struct xe_ggtt *ggtt, 25 + xe_ggtt_insert_node(struct xe_ggtt *ggtt, u32 size, u32 align); 26 + struct xe_ggtt_node * 27 + xe_ggtt_insert_node_transform(struct xe_ggtt *ggtt, 33 28 struct xe_bo *bo, u64 pte, 34 29 u64 size, u32 align, 35 30 xe_ggtt_transform_cb transform, void *arg); 36 31 void xe_ggtt_node_remove(struct xe_ggtt_node *node, bool invalidate); 37 - bool xe_ggtt_node_allocated(const struct xe_ggtt_node *node); 38 32 size_t xe_ggtt_node_pt_size(const struct xe_ggtt_node *node); 39 33 void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo); 40 34 int xe_ggtt_insert_bo(struct xe_ggtt *ggtt, struct xe_bo *bo, struct drm_exec *exec);
+2 -58
drivers/gpu/drm/xe/xe_ggtt_types.h
··· 6 6 #ifndef _XE_GGTT_TYPES_H_ 7 7 #define _XE_GGTT_TYPES_H_ 8 8 9 + #include <linux/types.h> 9 10 #include <drm/drm_mm.h> 10 11 11 - #include "xe_pt_types.h" 12 - 13 - struct xe_bo; 12 + struct xe_ggtt; 14 13 struct xe_ggtt_node; 15 - struct xe_gt; 16 - 17 - /** 18 - * struct xe_ggtt - Main GGTT struct 19 - * 20 - * In general, each tile can contains its own Global Graphics Translation Table 21 - * (GGTT) instance. 22 - */ 23 - struct xe_ggtt { 24 - /** @tile: Back pointer to tile where this GGTT belongs */ 25 - struct xe_tile *tile; 26 - /** @start: Start offset of GGTT */ 27 - u64 start; 28 - /** @size: Total usable size of this GGTT */ 29 - u64 size; 30 - 31 - #define XE_GGTT_FLAGS_64K BIT(0) 32 - /** 33 - * @flags: Flags for this GGTT 34 - * Acceptable flags: 35 - * - %XE_GGTT_FLAGS_64K - if PTE size is 64K. Otherwise, regular is 4K. 36 - */ 37 - unsigned int flags; 38 - /** @scratch: Internal object allocation used as a scratch page */ 39 - struct xe_bo *scratch; 40 - /** @lock: Mutex lock to protect GGTT data */ 41 - struct mutex lock; 42 - /** 43 - * @gsm: The iomem pointer to the actual location of the translation 44 - * table located in the GSM for easy PTE manipulation 45 - */ 46 - u64 __iomem *gsm; 47 - /** @pt_ops: Page Table operations per platform */ 48 - const struct xe_ggtt_pt_ops *pt_ops; 49 - /** @mm: The memory manager used to manage individual GGTT allocations */ 50 - struct drm_mm mm; 51 - /** @access_count: counts GGTT writes */ 52 - unsigned int access_count; 53 - /** @wq: Dedicated unordered work queue to process node removals */ 54 - struct workqueue_struct *wq; 55 - }; 56 14 57 15 typedef void (*xe_ggtt_set_pte_fn)(struct xe_ggtt *ggtt, u64 addr, u64 pte); 58 16 typedef void (*xe_ggtt_transform_cb)(struct xe_ggtt *ggtt, 59 17 struct xe_ggtt_node *node, 60 18 u64 pte_flags, 61 19 xe_ggtt_set_pte_fn set_pte, void *arg); 62 - /** 63 - * struct xe_ggtt_pt_ops - GGTT Page table operations 64 - * Which can vary from platform to platform. 65 - */ 66 - struct xe_ggtt_pt_ops { 67 - /** @pte_encode_flags: Encode PTE flags for a given BO */ 68 - u64 (*pte_encode_flags)(struct xe_bo *bo, u16 pat_index); 69 - 70 - /** @ggtt_set_pte: Directly write into GGTT's PTE */ 71 - xe_ggtt_set_pte_fn ggtt_set_pte; 72 - 73 - /** @ggtt_get_pte: Directly read from GGTT's PTE */ 74 - u64 (*ggtt_get_pte)(struct xe_ggtt *ggtt, u64 addr); 75 - }; 76 20 77 21 #endif
+35 -8
drivers/gpu/drm/xe/xe_gsc_proxy.c
··· 435 435 return 0; 436 436 } 437 437 438 - static void xe_gsc_proxy_remove(void *arg) 438 + static void xe_gsc_proxy_stop(struct xe_gsc *gsc) 439 439 { 440 - struct xe_gsc *gsc = arg; 441 440 struct xe_gt *gt = gsc_to_gt(gsc); 442 441 struct xe_device *xe = gt_to_xe(gt); 443 - 444 - if (!gsc->proxy.component_added) 445 - return; 446 442 447 443 /* disable HECI2 IRQs */ 448 444 scoped_guard(xe_pm_runtime, xe) { ··· 451 455 } 452 456 453 457 xe_gsc_wait_for_worker_completion(gsc); 458 + gsc->proxy.started = false; 459 + } 460 + 461 + static void xe_gsc_proxy_remove(void *arg) 462 + { 463 + struct xe_gsc *gsc = arg; 464 + struct xe_gt *gt = gsc_to_gt(gsc); 465 + struct xe_device *xe = gt_to_xe(gt); 466 + 467 + if (!gsc->proxy.component_added) 468 + return; 469 + 470 + /* 471 + * GSC proxy start is an async process that can be ongoing during 472 + * Xe module load/unload. Using devm managed action to register 473 + * xe_gsc_proxy_stop could cause issues if Xe module unload has 474 + * already started when the action is registered, potentially leading 475 + * to the cleanup being called at the wrong time. Therefore, instead 476 + * of registering a separate devm action to undo what is done in 477 + * proxy start, we call it from here, but only if the start has 478 + * completed successfully (tracked with the 'started' flag). 479 + */ 480 + if (gsc->proxy.started) 481 + xe_gsc_proxy_stop(gsc); 454 482 455 483 component_del(xe->drm.dev, &xe_gsc_proxy_component_ops); 456 484 gsc->proxy.component_added = false; ··· 530 510 */ 531 511 int xe_gsc_proxy_start(struct xe_gsc *gsc) 532 512 { 513 + struct xe_gt *gt = gsc_to_gt(gsc); 533 514 int err; 534 515 535 516 /* enable the proxy interrupt in the GSC shim layer */ ··· 542 521 */ 543 522 err = xe_gsc_proxy_request_handler(gsc); 544 523 if (err) 545 - return err; 524 + goto err_irq_disable; 546 525 547 526 if (!xe_gsc_proxy_init_done(gsc)) { 548 - xe_gt_err(gsc_to_gt(gsc), "GSC FW reports proxy init not completed\n"); 549 - return -EIO; 527 + xe_gt_err(gt, "GSC FW reports proxy init not completed\n"); 528 + err = -EIO; 529 + goto err_irq_disable; 550 530 } 551 531 532 + gsc->proxy.started = true; 552 533 return 0; 534 + 535 + err_irq_disable: 536 + gsc_proxy_irq_toggle(gsc, false); 537 + return err; 553 538 }
+2
drivers/gpu/drm/xe/xe_gsc_types.h
··· 58 58 struct mutex mutex; 59 59 /** @proxy.component_added: whether the component has been added */ 60 60 bool component_added; 61 + /** @proxy.started: whether the proxy has been started */ 62 + bool started; 61 63 /** @proxy.bo: object to store message to and from the GSC */ 62 64 struct xe_bo *bo; 63 65 /** @proxy.to_gsc: map of the memory used to send messages to the GSC */
+7 -6
drivers/gpu/drm/xe/xe_gt.c
··· 33 33 #include "xe_gt_printk.h" 34 34 #include "xe_gt_sriov_pf.h" 35 35 #include "xe_gt_sriov_vf.h" 36 + #include "xe_gt_stats.h" 36 37 #include "xe_gt_sysfs.h" 37 38 #include "xe_gt_topology.h" 38 39 #include "xe_guc_exec_queue_types.h" ··· 142 141 static void xe_gt_enable_comp_1wcoh(struct xe_gt *gt) 143 142 { 144 143 struct xe_device *xe = gt_to_xe(gt); 145 - unsigned int fw_ref; 146 144 u32 reg; 147 145 148 146 if (IS_SRIOV_VF(xe)) 149 147 return; 150 148 151 149 if (GRAPHICS_VER(xe) >= 30 && xe->info.has_flat_ccs) { 152 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 153 - if (!fw_ref) 150 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 151 + if (!fw_ref.domains) 154 152 return; 155 153 156 154 reg = xe_gt_mcr_unicast_read_any(gt, XE2_GAMREQSTRM_CTRL); ··· 163 163 reg |= EN_CMP_1WCOH_GW; 164 164 xe_gt_mcr_multicast_write(gt, XE2_GAMWALK_CTRL_3D, reg); 165 165 } 166 - 167 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 168 166 } 169 167 } 170 168 ··· 495 497 xe_gt_mmio_init(gt); 496 498 497 499 err = xe_uc_init_noalloc(&gt->uc); 500 + if (err) 501 + return err; 502 + 503 + err = xe_gt_stats_init(gt); 498 504 if (err) 499 505 return err; 500 506 ··· 896 894 if (IS_SRIOV_PF(gt_to_xe(gt))) 897 895 xe_gt_sriov_pf_stop_prepare(gt); 898 896 899 - xe_uc_gucrc_disable(&gt->uc); 900 897 xe_uc_stop_prepare(&gt->uc); 901 898 xe_pagefault_reset(gt_to_xe(gt), gt); 902 899
+28 -14
drivers/gpu/drm/xe/xe_gt_ccs_mode.c
··· 13 13 #include "xe_gt_sysfs.h" 14 14 #include "xe_mmio.h" 15 15 #include "xe_sriov.h" 16 + #include "xe_sriov_pf.h" 16 17 17 18 static void __xe_gt_apply_ccs_mode(struct xe_gt *gt, u32 num_engines) 18 19 { ··· 89 88 __xe_gt_apply_ccs_mode(gt, gt->ccs_mode); 90 89 } 91 90 91 + static bool gt_ccs_mode_default(struct xe_gt *gt) 92 + { 93 + return gt->ccs_mode == 1; 94 + } 95 + 92 96 static ssize_t 93 97 num_cslices_show(struct device *kdev, 94 98 struct device_attribute *attr, char *buf) ··· 123 117 u32 num_engines, num_slices; 124 118 int ret; 125 119 126 - if (IS_SRIOV(xe)) { 127 - xe_gt_dbg(gt, "Can't change compute mode when running as %s\n", 128 - xe_sriov_mode_to_string(xe_device_sriov_mode(xe))); 129 - return -EOPNOTSUPP; 130 - } 131 - 132 120 ret = kstrtou32(buff, 0, &num_engines); 133 121 if (ret) 134 122 return ret; ··· 139 139 } 140 140 141 141 /* CCS mode can only be updated when there are no drm clients */ 142 - mutex_lock(&xe->drm.filelist_mutex); 142 + guard(mutex)(&xe->drm.filelist_mutex); 143 143 if (!list_empty(&xe->drm.filelist)) { 144 - mutex_unlock(&xe->drm.filelist_mutex); 145 144 xe_gt_dbg(gt, "Rejecting compute mode change as there are active drm clients\n"); 146 145 return -EBUSY; 147 146 } 148 147 149 - if (gt->ccs_mode != num_engines) { 150 - xe_gt_info(gt, "Setting compute mode to %d\n", num_engines); 151 - gt->ccs_mode = num_engines; 152 - xe_gt_record_user_engines(gt); 153 - xe_gt_reset(gt); 148 + if (gt->ccs_mode == num_engines) 149 + return count; 150 + 151 + /* 152 + * Changing default CCS mode is only allowed when there 153 + * are no VFs. Try to lockdown PF to find out. 154 + */ 155 + if (gt_ccs_mode_default(gt) && IS_SRIOV_PF(xe)) { 156 + ret = xe_sriov_pf_lockdown(xe); 157 + if (ret) { 158 + xe_gt_dbg(gt, "Can't change CCS Mode: VFs are enabled\n"); 159 + return ret; 160 + } 154 161 } 155 162 156 - mutex_unlock(&xe->drm.filelist_mutex); 163 + xe_gt_info(gt, "Setting compute mode to %d\n", num_engines); 164 + gt->ccs_mode = num_engines; 165 + xe_gt_record_user_engines(gt); 166 + xe_gt_reset(gt); 167 + 168 + /* We may end PF lockdown once CCS mode is default again */ 169 + if (gt_ccs_mode_default(gt) && IS_SRIOV_PF(xe)) 170 + xe_sriov_pf_end_lockdown(xe); 157 171 158 172 return count; 159 173 }
+26
drivers/gpu/drm/xe/xe_gt_debugfs.c
··· 155 155 return 0; 156 156 } 157 157 158 + /* 159 + * Check the registers referenced on a save-restore list and report any 160 + * save-restore entries that did not get applied. 161 + */ 162 + static int register_save_restore_check(struct xe_gt *gt, struct drm_printer *p) 163 + { 164 + struct xe_hw_engine *hwe; 165 + enum xe_hw_engine_id id; 166 + 167 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL); 168 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)) { 169 + drm_printf(p, "ERROR: Could not acquire forcewake\n"); 170 + return -ETIMEDOUT; 171 + } 172 + 173 + xe_reg_sr_readback_check(&gt->reg_sr, gt, p); 174 + for_each_hw_engine(hwe, gt, id) 175 + xe_reg_sr_readback_check(&hwe->reg_sr, gt, p); 176 + for_each_hw_engine(hwe, gt, id) 177 + xe_reg_sr_lrc_check(&hwe->reg_lrc, gt, hwe, p); 178 + 179 + return 0; 180 + } 181 + 158 182 static int rcs_default_lrc(struct xe_gt *gt, struct drm_printer *p) 159 183 { 160 184 xe_lrc_dump_default(p, gt, XE_ENGINE_CLASS_RENDER); ··· 233 209 { "default_lrc_vecs", .show = xe_gt_debugfs_show_with_rpm, .data = vecs_default_lrc }, 234 210 { "hwconfig", .show = xe_gt_debugfs_show_with_rpm, .data = hwconfig }, 235 211 { "pat_sw_config", .show = xe_gt_debugfs_simple_show, .data = xe_pat_dump_sw_config }, 212 + { "register-save-restore-check", 213 + .show = xe_gt_debugfs_show_with_rpm, .data = register_save_restore_check }, 236 214 }; 237 215 238 216 /* everything else should be added here */
+25
drivers/gpu/drm/xe/xe_gt_idle.c
··· 168 168 xe_mmio_write32(&gt->mmio, POWERGATE_ENABLE, gtidle->powergate_enable); 169 169 } 170 170 171 + static void force_wake_domains_show(struct xe_gt *gt, struct drm_printer *p) 172 + { 173 + struct xe_force_wake_domain *domain; 174 + struct xe_force_wake *fw = gt_to_fw(gt); 175 + unsigned int tmp; 176 + unsigned long flags; 177 + 178 + spin_lock_irqsave(&fw->lock, flags); 179 + for_each_fw_domain(domain, fw, tmp) { 180 + drm_printf(p, "%s.ref_count=%u, %s.fwake=0x%x\n", 181 + xe_force_wake_domain_to_str(domain->id), 182 + READ_ONCE(domain->ref), 183 + xe_force_wake_domain_to_str(domain->id), 184 + xe_mmio_read32(&gt->mmio, domain->reg_ctl)); 185 + } 186 + spin_unlock_irqrestore(&fw->lock, flags); 187 + } 188 + 171 189 /** 172 190 * xe_gt_idle_pg_print - Xe powergating info 173 191 * @gt: GT object ··· 271 253 if (MEDIA_VERx100(xe) >= 1100 && MEDIA_VERx100(xe) < 1255) 272 254 drm_printf(p, "Media Samplers Power Gating Enabled: %s\n", 273 255 str_yes_no(pg_enabled & MEDIA_SAMPLERS_POWERGATE_ENABLE)); 256 + 257 + if (gt->info.engine_mask & BIT(XE_HW_ENGINE_GSCCS0)) { 258 + drm_printf(p, "GSC Power Gate Status: %s\n", 259 + str_up_down(pg_status & GSC_AWAKE_STATUS)); 260 + } 261 + 262 + force_wake_domains_show(gt, p); 274 263 275 264 return 0; 276 265 }
+17 -13
drivers/gpu/drm/xe/xe_gt_mcr.c
··· 201 201 { 0x009680, 0x0096FF }, /* DSS */ 202 202 { 0x00D800, 0x00D87F }, /* SLICE */ 203 203 { 0x00DC00, 0x00DCFF }, /* SLICE */ 204 - { 0x00DE80, 0x00E8FF }, /* DSS (0xE000-0xE0FF reserved) */ 204 + { 0x00DE00, 0x00E8FF }, /* DSS (0xE000-0xE0FF reserved) */ 205 205 { 0x00E980, 0x00E9FF }, /* SLICE */ 206 206 { 0x013000, 0x0133FF }, /* DSS (0x13000-0x131FF), SLICE (0x13200-0x133FF) */ 207 207 {}, ··· 277 277 { 0x00B500, 0x00B6FF }, /* PSMI */ 278 278 { 0x00C800, 0x00CFFF }, /* GAMCTRL */ 279 279 { 0x00F000, 0x00F0FF }, /* GAMCTRL */ 280 + {}, 281 + }; 282 + 283 + static const struct xe_mmio_range xe3p_lpg_instance0_steering_table[] = { 284 + { 0x004000, 0x004AFF }, /* GAM, rsvd, GAMWKR */ 285 + { 0x008700, 0x00887F }, /* NODE */ 286 + { 0x00B000, 0x00B3FF }, /* NODE, L3BANK */ 287 + { 0x00B500, 0x00B6FF }, /* PSMI */ 288 + { 0x00C800, 0x00CFFF }, /* GAM */ 289 + { 0x00D880, 0x00D8FF }, /* NODE */ 290 + { 0x00DD00, 0x00DD7F }, /* MEMPIPE */ 291 + { 0x00F000, 0x00FFFF }, /* GAM, GAMWKR */ 292 + { 0x013400, 0x0135FF }, /* MEMPIPE */ 280 293 {}, 281 294 }; 282 295 ··· 518 505 519 506 spin_lock_init(&gt->mcr_lock); 520 507 521 - if (IS_SRIOV_VF(xe)) 522 - return; 523 - 524 508 if (gt->info.type == XE_GT_TYPE_MEDIA) { 525 509 drm_WARN_ON(&xe->drm, MEDIA_VER(xe) < 13); 526 510 ··· 532 522 } 533 523 } else { 534 524 if (GRAPHICS_VERx100(xe) == 3511) { 535 - /* 536 - * TODO: there are some ranges in bspec with missing 537 - * termination: [0x00B000, 0x00B0FF] and 538 - * [0x00D880, 0x00D8FF] (NODE); [0x00B100, 0x00B3FF] 539 - * (L3BANK). Update them here once bspec is updated. 540 - */ 541 525 gt->steering[DSS].ranges = xe3p_xpc_xecore_steering_table; 542 526 gt->steering[GAM1].ranges = xe3p_xpc_gam_grp1_steering_table; 543 527 gt->steering[INSTANCE0].ranges = xe3p_xpc_instance0_steering_table; 544 528 gt->steering[L3BANK].ranges = xelpg_l3bank_steering_table; 545 529 gt->steering[NODE].ranges = xe3p_xpc_node_steering_table; 530 + } else if (GRAPHICS_VERx100(xe) >= 3510) { 531 + gt->steering[DSS].ranges = xe2lpg_dss_steering_table; 532 + gt->steering[INSTANCE0].ranges = xe3p_lpg_instance0_steering_table; 546 533 } else if (GRAPHICS_VER(xe) >= 20) { 547 534 gt->steering[DSS].ranges = xe2lpg_dss_steering_table; 548 535 gt->steering[SQIDI_PSMI].ranges = xe2lpg_sqidi_psmi_steering_table; ··· 575 568 */ 576 569 void xe_gt_mcr_init(struct xe_gt *gt) 577 570 { 578 - if (IS_SRIOV_VF(gt_to_xe(gt))) 579 - return; 580 - 581 571 /* Select non-terminated steering target for each type */ 582 572 for (int i = 0; i < NUM_STEERING_TYPES; i++) { 583 573 gt->steering[i].initialized = true;
+184 -53
drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
··· 279 279 { 280 280 struct xe_ggtt_node *node = config->ggtt_region; 281 281 282 - if (!xe_ggtt_node_allocated(node)) 282 + if (!node) 283 283 return 0; 284 284 285 285 return encode_ggtt(cfg, xe_ggtt_node_addr(node), xe_ggtt_node_size(node), details); ··· 482 482 return err ?: err2; 483 483 } 484 484 485 - static void pf_release_ggtt(struct xe_tile *tile, struct xe_ggtt_node *node) 486 - { 487 - if (xe_ggtt_node_allocated(node)) { 488 - /* 489 - * explicit GGTT PTE assignment to the PF using xe_ggtt_assign() 490 - * is redundant, as PTE will be implicitly re-assigned to PF by 491 - * the xe_ggtt_clear() called by below xe_ggtt_remove_node(). 492 - */ 493 - xe_ggtt_node_remove(node, false); 494 - } else { 495 - xe_ggtt_node_fini(node); 496 - } 497 - } 498 - 499 485 static void pf_release_vf_config_ggtt(struct xe_gt *gt, struct xe_gt_sriov_config *config) 500 486 { 501 - pf_release_ggtt(gt_to_tile(gt), config->ggtt_region); 487 + xe_ggtt_node_remove(config->ggtt_region, false); 502 488 config->ggtt_region = NULL; 503 489 } 504 490 ··· 503 517 504 518 size = round_up(size, alignment); 505 519 506 - if (xe_ggtt_node_allocated(config->ggtt_region)) { 520 + if (config->ggtt_region) { 507 521 err = pf_distribute_config_ggtt(tile, vfid, 0, 0); 508 522 if (unlikely(err)) 509 523 return err; ··· 514 528 if (unlikely(err)) 515 529 return err; 516 530 } 517 - xe_gt_assert(gt, !xe_ggtt_node_allocated(config->ggtt_region)); 531 + xe_gt_assert(gt, !config->ggtt_region); 518 532 519 533 if (!size) 520 534 return 0; 521 535 522 - node = xe_ggtt_node_init(ggtt); 536 + node = xe_ggtt_insert_node(ggtt, size, alignment); 523 537 if (IS_ERR(node)) 524 538 return PTR_ERR(node); 525 - 526 - err = xe_ggtt_node_insert(node, size, alignment); 527 - if (unlikely(err)) 528 - goto err; 529 539 530 540 xe_ggtt_assign(node, vfid); 531 541 xe_gt_sriov_dbg_verbose(gt, "VF%u assigned GGTT %llx-%llx\n", ··· 534 552 config->ggtt_region = node; 535 553 return 0; 536 554 err: 537 - pf_release_ggtt(tile, node); 555 + xe_ggtt_node_remove(node, false); 538 556 return err; 539 557 } 540 558 ··· 544 562 struct xe_ggtt_node *node = config->ggtt_region; 545 563 546 564 xe_gt_assert(gt, xe_gt_is_main_type(gt)); 547 - return xe_ggtt_node_allocated(node) ? xe_ggtt_node_size(node) : 0; 565 + return node ? xe_ggtt_node_size(node) : 0; 548 566 } 549 567 550 568 /** ··· 1451 1469 1452 1470 static u64 pf_get_lmem_alignment(struct xe_gt *gt) 1453 1471 { 1454 - /* this might be platform dependent */ 1455 - return SZ_2M; 1472 + return xe_device_has_lmtt(gt_to_xe(gt)) ? 1473 + xe_lmtt_page_size(&gt_to_tile(gt)->sriov.pf.lmtt) : XE_PAGE_SIZE; 1456 1474 } 1457 1475 1458 1476 static u64 pf_get_min_spare_lmem(struct xe_gt *gt) ··· 1627 1645 struct xe_device *xe = gt_to_xe(gt); 1628 1646 struct xe_tile *tile = gt_to_tile(gt); 1629 1647 struct xe_bo *bo; 1648 + u64 alignment; 1630 1649 int err; 1631 1650 1632 1651 xe_gt_assert(gt, vfid); 1633 1652 xe_gt_assert(gt, IS_DGFX(xe)); 1634 1653 xe_gt_assert(gt, xe_gt_is_main_type(gt)); 1635 1654 1636 - size = round_up(size, pf_get_lmem_alignment(gt)); 1655 + alignment = pf_get_lmem_alignment(gt); 1656 + size = round_up(size, alignment); 1637 1657 1638 1658 if (config->lmem_obj) { 1639 1659 err = pf_distribute_config_lmem(gt, vfid, 0); ··· 1651 1667 if (!size) 1652 1668 return 0; 1653 1669 1654 - xe_gt_assert(gt, pf_get_lmem_alignment(gt) == SZ_2M); 1670 + xe_gt_assert(gt, alignment == XE_PAGE_SIZE || alignment == SZ_2M); 1655 1671 bo = xe_bo_create_pin_range_novm(xe, tile, 1656 1672 ALIGN(size, PAGE_SIZE), 0, ~0ull, 1657 1673 ttm_bo_type_kernel, 1658 - XE_BO_FLAG_VRAM_IF_DGFX(tile) | 1659 - XE_BO_FLAG_NEEDS_2M | 1674 + XE_BO_FLAG_VRAM(tile->mem.vram) | 1675 + (alignment == SZ_2M ? XE_BO_FLAG_NEEDS_2M : 0) | 1660 1676 XE_BO_FLAG_PINNED | 1661 1677 XE_BO_FLAG_PINNED_LATE_RESTORE | 1662 1678 XE_BO_FLAG_FORCE_USER_VRAM); ··· 1738 1754 } 1739 1755 1740 1756 /** 1741 - * xe_gt_sriov_pf_config_bulk_set_lmem - Provision many VFs with LMEM. 1757 + * xe_gt_sriov_pf_config_bulk_set_lmem_locked() - Provision many VFs with LMEM. 1758 + * @gt: the &xe_gt (can't be media) 1759 + * @vfid: starting VF identifier (can't be 0) 1760 + * @num_vfs: number of VFs to provision 1761 + * @size: requested LMEM size 1762 + * 1763 + * This function can only be called on PF. 1764 + * 1765 + * Return: 0 on success or a negative error code on failure. 1766 + */ 1767 + int xe_gt_sriov_pf_config_bulk_set_lmem_locked(struct xe_gt *gt, unsigned int vfid, 1768 + unsigned int num_vfs, u64 size) 1769 + { 1770 + unsigned int n; 1771 + int err = 0; 1772 + 1773 + lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt)); 1774 + xe_gt_assert(gt, xe_device_has_lmtt(gt_to_xe(gt))); 1775 + xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt))); 1776 + xe_gt_assert(gt, xe_gt_is_main_type(gt)); 1777 + xe_gt_assert(gt, vfid); 1778 + 1779 + if (!num_vfs) 1780 + return 0; 1781 + 1782 + for (n = vfid; n < vfid + num_vfs; n++) { 1783 + err = pf_provision_vf_lmem(gt, n, size); 1784 + if (err) 1785 + break; 1786 + } 1787 + 1788 + return pf_config_bulk_set_u64_done(gt, vfid, num_vfs, size, 1789 + pf_get_vf_config_lmem, 1790 + "LMEM", n, err); 1791 + } 1792 + 1793 + /** 1794 + * xe_gt_sriov_pf_config_bulk_set_lmem() - Provision many VFs with LMEM. 1742 1795 * @gt: the &xe_gt (can't be media) 1743 1796 * @vfid: starting VF identifier (can't be 0) 1744 1797 * @num_vfs: number of VFs to provision ··· 1788 1767 int xe_gt_sriov_pf_config_bulk_set_lmem(struct xe_gt *gt, unsigned int vfid, 1789 1768 unsigned int num_vfs, u64 size) 1790 1769 { 1791 - unsigned int n; 1792 - int err = 0; 1770 + guard(mutex)(xe_gt_sriov_pf_master_mutex(gt)); 1793 1771 1772 + return xe_gt_sriov_pf_config_bulk_set_lmem_locked(gt, vfid, num_vfs, size); 1773 + } 1774 + 1775 + /** 1776 + * xe_gt_sriov_pf_config_get_lmem_locked() - Get VF's LMEM quota. 1777 + * @gt: the &xe_gt 1778 + * @vfid: the VF identifier (can't be 0 == PFID) 1779 + * 1780 + * This function can only be called on PF. 1781 + * 1782 + * Return: VF's LMEM quota. 1783 + */ 1784 + u64 xe_gt_sriov_pf_config_get_lmem_locked(struct xe_gt *gt, unsigned int vfid) 1785 + { 1786 + lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt)); 1787 + xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt))); 1794 1788 xe_gt_assert(gt, vfid); 1789 + 1790 + return pf_get_vf_config_lmem(gt, vfid); 1791 + } 1792 + 1793 + /** 1794 + * xe_gt_sriov_pf_config_set_lmem_locked() - Provision VF with LMEM. 1795 + * @gt: the &xe_gt (can't be media) 1796 + * @vfid: the VF identifier (can't be 0 == PFID) 1797 + * @size: requested LMEM size 1798 + * 1799 + * This function can only be called on PF. 1800 + */ 1801 + int xe_gt_sriov_pf_config_set_lmem_locked(struct xe_gt *gt, unsigned int vfid, u64 size) 1802 + { 1803 + int err; 1804 + 1805 + lockdep_assert_held(xe_gt_sriov_pf_master_mutex(gt)); 1806 + xe_gt_assert(gt, xe_device_has_lmtt(gt_to_xe(gt))); 1807 + xe_gt_assert(gt, IS_SRIOV_PF(gt_to_xe(gt))); 1795 1808 xe_gt_assert(gt, xe_gt_is_main_type(gt)); 1809 + xe_gt_assert(gt, vfid); 1796 1810 1797 - if (!num_vfs) 1798 - return 0; 1811 + err = pf_provision_vf_lmem(gt, vfid, size); 1799 1812 1800 - mutex_lock(xe_gt_sriov_pf_master_mutex(gt)); 1801 - for (n = vfid; n < vfid + num_vfs; n++) { 1802 - err = pf_provision_vf_lmem(gt, n, size); 1803 - if (err) 1804 - break; 1805 - } 1806 - mutex_unlock(xe_gt_sriov_pf_master_mutex(gt)); 1807 - 1808 - return pf_config_bulk_set_u64_done(gt, vfid, num_vfs, size, 1809 - xe_gt_sriov_pf_config_get_lmem, 1810 - "LMEM", n, err); 1813 + return pf_config_set_u64_done(gt, vfid, size, 1814 + pf_get_vf_config_lmem(gt, vfid), 1815 + "LMEM", err); 1811 1816 } 1812 1817 1813 1818 static struct xe_bo *pf_get_vf_config_lmem_obj(struct xe_gt *gt, unsigned int vfid) ··· 1903 1856 return fair; 1904 1857 } 1905 1858 1859 + static u64 pf_profile_fair_lmem(struct xe_gt *gt, unsigned int num_vfs) 1860 + { 1861 + struct xe_tile *tile = gt_to_tile(gt); 1862 + bool admin_only_pf = xe_sriov_pf_admin_only(tile->xe); 1863 + u64 usable = xe_vram_region_usable_size(tile->mem.vram); 1864 + u64 spare = pf_get_min_spare_lmem(gt); 1865 + u64 available = usable > spare ? usable - spare : 0; 1866 + u64 shareable = ALIGN_DOWN(available, SZ_1G); 1867 + u64 alignment = pf_get_lmem_alignment(gt); 1868 + u64 fair; 1869 + 1870 + if (admin_only_pf) 1871 + fair = div_u64(shareable, num_vfs); 1872 + else 1873 + fair = div_u64(shareable, 1 + num_vfs); 1874 + 1875 + if (!admin_only_pf && fair) 1876 + fair = rounddown_pow_of_two(fair); 1877 + 1878 + return ALIGN_DOWN(fair, alignment); 1879 + } 1880 + 1881 + static void __pf_show_provisioning_lmem(struct xe_gt *gt, unsigned int first_vf, 1882 + unsigned int num_vfs, bool provisioned) 1883 + { 1884 + unsigned int allvfs = 1 + xe_gt_sriov_pf_get_totalvfs(gt); /* PF plus VFs */ 1885 + unsigned long *bitmap __free(bitmap) = bitmap_zalloc(allvfs, GFP_KERNEL); 1886 + unsigned int weight; 1887 + unsigned int n; 1888 + 1889 + if (!bitmap) 1890 + return; 1891 + 1892 + for (n = first_vf; n < first_vf + num_vfs; n++) { 1893 + if (!!pf_get_vf_config_lmem(gt, VFID(n)) == provisioned) 1894 + bitmap_set(bitmap, n, 1); 1895 + } 1896 + 1897 + weight = bitmap_weight(bitmap, allvfs); 1898 + if (!weight) 1899 + return; 1900 + 1901 + xe_gt_sriov_info(gt, "VF%s%*pbl %s provisioned with VRAM\n", 1902 + weight > 1 ? "s " : "", allvfs, bitmap, 1903 + provisioned ? "already" : "not"); 1904 + } 1905 + 1906 + static void pf_show_all_provisioned_lmem(struct xe_gt *gt) 1907 + { 1908 + __pf_show_provisioning_lmem(gt, VFID(1), xe_gt_sriov_pf_get_totalvfs(gt), true); 1909 + } 1910 + 1911 + static void pf_show_unprovisioned_lmem(struct xe_gt *gt, unsigned int first_vf, 1912 + unsigned int num_vfs) 1913 + { 1914 + __pf_show_provisioning_lmem(gt, first_vf, num_vfs, false); 1915 + } 1916 + 1917 + static bool pf_needs_provision_lmem(struct xe_gt *gt, unsigned int first_vf, 1918 + unsigned int num_vfs) 1919 + { 1920 + unsigned int vfid; 1921 + 1922 + for (vfid = first_vf; vfid < first_vf + num_vfs; vfid++) { 1923 + if (pf_get_vf_config_lmem(gt, vfid)) { 1924 + pf_show_all_provisioned_lmem(gt); 1925 + pf_show_unprovisioned_lmem(gt, first_vf, num_vfs); 1926 + return false; 1927 + } 1928 + } 1929 + 1930 + pf_show_all_provisioned_lmem(gt); 1931 + return true; 1932 + } 1933 + 1906 1934 /** 1907 1935 * xe_gt_sriov_pf_config_set_fair_lmem - Provision many VFs with fair LMEM. 1908 1936 * @gt: the &xe_gt (can't be media) ··· 1991 1869 int xe_gt_sriov_pf_config_set_fair_lmem(struct xe_gt *gt, unsigned int vfid, 1992 1870 unsigned int num_vfs) 1993 1871 { 1872 + u64 profile; 1994 1873 u64 fair; 1995 1874 1996 1875 xe_gt_assert(gt, vfid); ··· 2001 1878 if (!xe_device_has_lmtt(gt_to_xe(gt))) 2002 1879 return 0; 2003 1880 2004 - mutex_lock(xe_gt_sriov_pf_master_mutex(gt)); 2005 - fair = pf_estimate_fair_lmem(gt, num_vfs); 2006 - mutex_unlock(xe_gt_sriov_pf_master_mutex(gt)); 1881 + guard(mutex)(xe_gt_sriov_pf_master_mutex(gt)); 2007 1882 1883 + if (!pf_needs_provision_lmem(gt, vfid, num_vfs)) 1884 + return 0; 1885 + 1886 + fair = pf_estimate_fair_lmem(gt, num_vfs); 2008 1887 if (!fair) 2009 1888 return -ENOSPC; 2010 1889 2011 - return xe_gt_sriov_pf_config_bulk_set_lmem(gt, vfid, num_vfs, fair); 1890 + profile = pf_profile_fair_lmem(gt, num_vfs); 1891 + fair = min(fair, profile); 1892 + if (fair < profile) 1893 + xe_gt_sriov_info(gt, "Using non-profile provisioning (%s %llu vs %llu)\n", 1894 + "VRAM", fair, profile); 1895 + 1896 + return xe_gt_sriov_pf_config_bulk_set_lmem_locked(gt, vfid, num_vfs, fair); 2012 1897 } 2013 1898 2014 1899 /** ··· 2707 2576 2708 2577 static void pf_sanitize_ggtt(struct xe_ggtt_node *ggtt_region, unsigned int vfid) 2709 2578 { 2710 - if (xe_ggtt_node_allocated(ggtt_region)) 2579 + if (ggtt_region) 2711 2580 xe_ggtt_assign(ggtt_region, vfid); 2712 2581 } 2713 2582 ··· 3166 3035 3167 3036 for (n = 1; n <= total_vfs; n++) { 3168 3037 config = &gt->sriov.pf.vfs[n].config; 3169 - if (!xe_ggtt_node_allocated(config->ggtt_region)) 3038 + if (!config->ggtt_region) 3170 3039 continue; 3171 3040 3172 3041 string_get_size(xe_ggtt_node_size(config->ggtt_region), 1, STRING_UNITS_2,
+4
drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h
··· 36 36 int xe_gt_sriov_pf_config_set_fair_lmem(struct xe_gt *gt, unsigned int vfid, unsigned int num_vfs); 37 37 int xe_gt_sriov_pf_config_bulk_set_lmem(struct xe_gt *gt, unsigned int vfid, unsigned int num_vfs, 38 38 u64 size); 39 + u64 xe_gt_sriov_pf_config_get_lmem_locked(struct xe_gt *gt, unsigned int vfid); 40 + int xe_gt_sriov_pf_config_set_lmem_locked(struct xe_gt *gt, unsigned int vfid, u64 size); 41 + int xe_gt_sriov_pf_config_bulk_set_lmem_locked(struct xe_gt *gt, unsigned int vfid, 42 + unsigned int num_vfs, u64 size); 39 43 struct xe_bo *xe_gt_sriov_pf_config_get_lmem_obj(struct xe_gt *gt, unsigned int vfid); 40 44 41 45 u32 xe_gt_sriov_pf_config_get_exec_quantum(struct xe_gt *gt, unsigned int vfid);
+1 -1
drivers/gpu/drm/xe/xe_gt_sriov_pf_control.c
··· 1259 1259 } 1260 1260 1261 1261 /** 1262 - * xe_gt_sriov_pf_control_trigger restore_vf() - Start an SR-IOV VF migration data restore sequence. 1262 + * xe_gt_sriov_pf_control_trigger_restore_vf() - Start an SR-IOV VF migration data restore sequence. 1263 1263 * @gt: the &xe_gt 1264 1264 * @vfid: the VF identifier 1265 1265 *
+2
drivers/gpu/drm/xe/xe_gt_sriov_pf_service.c
··· 111 111 XE2_GT_COMPUTE_DSS_2, /* _MMIO(0x914c) */ 112 112 XE2_GT_GEOMETRY_DSS_1, /* _MMIO(0x9150) */ 113 113 XE2_GT_GEOMETRY_DSS_2, /* _MMIO(0x9154) */ 114 + XE3P_XPC_GT_GEOMETRY_DSS_3, /* _MMIO(0x915c) */ 115 + XE3P_XPC_GT_COMPUTE_DSS_3, /* _MMIO(0x9160) */ 114 116 SERVICE_COPY_ENABLE, /* _MMIO(0x9170) */ 115 117 }; 116 118
+65 -24
drivers/gpu/drm/xe/xe_gt_sriov_vf.c
··· 488 488 static int vf_get_ggtt_info(struct xe_gt *gt) 489 489 { 490 490 struct xe_tile *tile = gt_to_tile(gt); 491 - struct xe_ggtt *ggtt = tile->mem.ggtt; 492 491 struct xe_guc *guc = &gt->uc.guc; 493 492 u64 start, size, ggtt_size; 494 - s64 shift; 495 493 int err; 496 494 497 495 xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); 498 - 499 - guard(mutex)(&ggtt->lock); 500 496 501 497 err = guc_action_query_single_klv64(guc, GUC_KLV_VF_CFG_GGTT_START_KEY, &start); 502 498 if (unlikely(err)) ··· 505 509 if (!size) 506 510 return -ENODATA; 507 511 512 + xe_tile_sriov_vf_ggtt_base_store(tile, start); 508 513 ggtt_size = xe_tile_sriov_vf_ggtt(tile); 509 - if (ggtt_size && ggtt_size != size) { 514 + if (!ggtt_size) { 515 + /* 516 + * This function is called once during xe_guc_init_noalloc(), 517 + * at which point ggtt_size = 0 and we have to initialize everything, 518 + * and GGTT is not yet initialized. 519 + * 520 + * Return early as there's nothing to fixup. 521 + */ 522 + xe_tile_sriov_vf_ggtt_store(tile, size); 523 + return 0; 524 + } 525 + 526 + if (ggtt_size != size) { 510 527 xe_gt_sriov_err(gt, "Unexpected GGTT reassignment: %lluK != %lluK\n", 511 528 size / SZ_1K, ggtt_size / SZ_1K); 512 529 return -EREMCHG; ··· 528 519 xe_gt_sriov_dbg_verbose(gt, "GGTT %#llx-%#llx = %lluK\n", 529 520 start, start + size - 1, size / SZ_1K); 530 521 531 - shift = start - (s64)xe_tile_sriov_vf_ggtt_base(tile); 532 - xe_tile_sriov_vf_ggtt_base_store(tile, start); 533 - xe_tile_sriov_vf_ggtt_store(tile, size); 534 - 535 - if (shift && shift != start) { 536 - xe_gt_sriov_info(gt, "Shifting GGTT base by %lld to 0x%016llx\n", 537 - shift, start); 538 - xe_tile_sriov_vf_fixup_ggtt_nodes_locked(gt_to_tile(gt), shift); 539 - } 540 - 541 - if (xe_sriov_vf_migration_supported(gt_to_xe(gt))) { 542 - WRITE_ONCE(gt->sriov.vf.migration.ggtt_need_fixes, false); 543 - smp_wmb(); /* Ensure above write visible before wake */ 544 - wake_up_all(&gt->sriov.vf.migration.wq); 545 - } 522 + /* 523 + * This function can be called repeatedly from post migration fixups, 524 + * at which point we inform the GGTT of the new base address. 525 + * xe_ggtt_shift_nodes() may be called multiple times for each migration, 526 + * but will be a noop if the base is unchanged. 527 + */ 528 + xe_ggtt_shift_nodes(tile->mem.ggtt, start); 546 529 547 530 return 0; 548 531 } ··· 838 837 839 838 for_each_hw_engine(hwe, gt, id) 840 839 xe_default_lrc_update_memirq_regs_with_address(hwe); 840 + } 841 + 842 + static void vf_post_migration_mark_fixups_done(struct xe_gt *gt) 843 + { 844 + WRITE_ONCE(gt->sriov.vf.migration.ggtt_need_fixes, false); 845 + smp_wmb(); /* Ensure above write visible before wake */ 846 + wake_up_all(&gt->sriov.vf.migration.wq); 841 847 } 842 848 843 849 static void vf_start_migration_recovery(struct xe_gt *gt) ··· 1277 1269 if (err) 1278 1270 return err; 1279 1271 1272 + atomic_inc(&gt->sriov.vf.migration.fixups_complete_count); 1273 + 1280 1274 return 0; 1281 1275 } 1282 1276 ··· 1383 1373 if (err) 1384 1374 goto fail; 1385 1375 1376 + vf_post_migration_mark_fixups_done(gt); 1386 1377 vf_post_migration_rearm(gt); 1387 1378 1388 1379 err = vf_post_migration_resfix_done(gt, marker); ··· 1518 1507 } 1519 1508 1520 1509 /** 1521 - * xe_gt_sriov_vf_wait_valid_ggtt() - VF wait for valid GGTT addresses 1522 - * @gt: the &xe_gt 1510 + * xe_vf_migration_fixups_complete_count() - Get count of VF fixups completions. 1511 + * @gt: the &xe_gt instance which contains affected Global GTT 1512 + * 1513 + * Return: number of times VF fixups were completed since driver 1514 + * probe, or 0 if migration is not available, or -1 if fixups are 1515 + * pending or being applied right now. 1523 1516 */ 1524 - void xe_gt_sriov_vf_wait_valid_ggtt(struct xe_gt *gt) 1517 + int xe_vf_migration_fixups_complete_count(struct xe_gt *gt) 1518 + { 1519 + if (!IS_SRIOV_VF(gt_to_xe(gt)) || 1520 + !xe_sriov_vf_migration_supported(gt_to_xe(gt))) 1521 + return 0; 1522 + 1523 + /* should never match fixups_complete_count value */ 1524 + if (!vf_valid_ggtt(gt)) 1525 + return -1; 1526 + 1527 + return atomic_read(&gt->sriov.vf.migration.fixups_complete_count); 1528 + } 1529 + 1530 + /** 1531 + * xe_gt_sriov_vf_wait_valid_ggtt() - wait for valid GGTT nodes and address refs 1532 + * @gt: the &xe_gt instance which contains affected Global GTT 1533 + * 1534 + * Return: number of times VF fixups were completed since driver 1535 + * probe, or 0 if migration is not available. 1536 + */ 1537 + int xe_gt_sriov_vf_wait_valid_ggtt(struct xe_gt *gt) 1525 1538 { 1526 1539 int ret; 1527 1540 1541 + /* 1542 + * this condition needs to be identical to one in 1543 + * xe_vf_migration_fixups_complete_count() 1544 + */ 1528 1545 if (!IS_SRIOV_VF(gt_to_xe(gt)) || 1529 1546 !xe_sriov_vf_migration_supported(gt_to_xe(gt))) 1530 - return; 1547 + return 0; 1531 1548 1532 1549 ret = wait_event_interruptible_timeout(gt->sriov.vf.migration.wq, 1533 1550 vf_valid_ggtt(gt), 1534 1551 HZ * 5); 1535 1552 xe_gt_WARN_ON(gt, !ret); 1553 + 1554 + return atomic_read(&gt->sriov.vf.migration.fixups_complete_count); 1536 1555 }
+2 -1
drivers/gpu/drm/xe/xe_gt_sriov_vf.h
··· 39 39 void xe_gt_sriov_vf_print_runtime(struct xe_gt *gt, struct drm_printer *p); 40 40 void xe_gt_sriov_vf_print_version(struct xe_gt *gt, struct drm_printer *p); 41 41 42 - void xe_gt_sriov_vf_wait_valid_ggtt(struct xe_gt *gt); 42 + int xe_gt_sriov_vf_wait_valid_ggtt(struct xe_gt *gt); 43 + int xe_vf_migration_fixups_complete_count(struct xe_gt *gt); 43 44 44 45 #endif
+3 -1
drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
··· 54 54 wait_queue_head_t wq; 55 55 /** @scratch: Scratch memory for VF recovery */ 56 56 void *scratch; 57 + /** @fixups_complete_count: Counts completed fixups stages */ 58 + atomic_t fixups_complete_count; 57 59 /** @debug: Debug hooks for delaying migration */ 58 60 struct { 59 61 /** ··· 75 73 bool recovery_queued; 76 74 /** @recovery_inprogress: VF post migration recovery in progress */ 77 75 bool recovery_inprogress; 78 - /** @ggtt_need_fixes: VF GGTT needs fixes */ 76 + /** @ggtt_need_fixes: VF GGTT and references to it need fixes */ 79 77 bool ggtt_need_fixes; 80 78 }; 81 79
+52 -12
drivers/gpu/drm/xe/xe_gt_stats.c
··· 3 3 * Copyright © 2024 Intel Corporation 4 4 */ 5 5 6 - #include <linux/atomic.h> 7 - 6 + #include <drm/drm_managed.h> 8 7 #include <drm/drm_print.h> 9 8 9 + #include "xe_device.h" 10 10 #include "xe_gt_stats.h" 11 - #include "xe_gt_types.h" 11 + 12 + static void xe_gt_stats_fini(struct drm_device *drm, void *arg) 13 + { 14 + struct xe_gt *gt = arg; 15 + 16 + free_percpu(gt->stats); 17 + } 18 + 19 + /** 20 + * xe_gt_stats_init() - Initialize GT statistics 21 + * @gt: GT structure 22 + * 23 + * Allocate per-CPU GT statistics. Using per-CPU stats allows increments 24 + * to occur without cross-CPU atomics. 25 + * 26 + * Return: 0 on success, -ENOMEM on failure. 27 + */ 28 + int xe_gt_stats_init(struct xe_gt *gt) 29 + { 30 + gt->stats = alloc_percpu(struct xe_gt_stats); 31 + if (!gt->stats) 32 + return -ENOMEM; 33 + 34 + return drmm_add_action_or_reset(&gt_to_xe(gt)->drm, xe_gt_stats_fini, 35 + gt); 36 + } 12 37 13 38 /** 14 39 * xe_gt_stats_incr - Increments the specified stats counter ··· 48 23 if (id >= __XE_GT_STATS_NUM_IDS) 49 24 return; 50 25 51 - atomic64_add(incr, &gt->stats.counters[id]); 26 + this_cpu_add(gt->stats->counters[id], incr); 52 27 } 53 28 54 29 #define DEF_STAT_STR(ID, name) [XE_GT_STATS_ID_##ID] = name ··· 60 35 DEF_STAT_STR(SVM_TLB_INVAL_US, "svm_tlb_inval_us"), 61 36 DEF_STAT_STR(VMA_PAGEFAULT_COUNT, "vma_pagefault_count"), 62 37 DEF_STAT_STR(VMA_PAGEFAULT_KB, "vma_pagefault_kb"), 38 + DEF_STAT_STR(INVALID_PREFETCH_PAGEFAULT_COUNT, "invalid_prefetch_pagefault_count"), 63 39 DEF_STAT_STR(SVM_4K_PAGEFAULT_COUNT, "svm_4K_pagefault_count"), 64 40 DEF_STAT_STR(SVM_64K_PAGEFAULT_COUNT, "svm_64K_pagefault_count"), 65 41 DEF_STAT_STR(SVM_2M_PAGEFAULT_COUNT, "svm_2M_pagefault_count"), ··· 120 94 { 121 95 enum xe_gt_stats_id id; 122 96 123 - for (id = 0; id < __XE_GT_STATS_NUM_IDS; ++id) 124 - drm_printf(p, "%s: %lld\n", stat_description[id], 125 - atomic64_read(&gt->stats.counters[id])); 97 + for (id = 0; id < __XE_GT_STATS_NUM_IDS; ++id) { 98 + u64 total = 0; 99 + int cpu; 100 + 101 + for_each_possible_cpu(cpu) { 102 + struct xe_gt_stats *s = per_cpu_ptr(gt->stats, cpu); 103 + 104 + total += s->counters[id]; 105 + } 106 + 107 + drm_printf(p, "%s: %lld\n", stat_description[id], total); 108 + } 126 109 127 110 return 0; 128 111 } 129 112 130 113 /** 131 - * xe_gt_stats_clear - Clear the GT stats 114 + * xe_gt_stats_clear() - Clear the GT stats 132 115 * @gt: GT structure 133 116 * 134 - * This clear (zeros) all the available GT stats. 117 + * Clear (zero) all available GT stats. Note that if the stats are being 118 + * updated while this function is running, the results may be unpredictable. 119 + * Intended to be called on an idle GPU. 135 120 */ 136 121 void xe_gt_stats_clear(struct xe_gt *gt) 137 122 { 138 - int id; 123 + int cpu; 139 124 140 - for (id = 0; id < ARRAY_SIZE(gt->stats.counters); ++id) 141 - atomic64_set(&gt->stats.counters[id], 0); 125 + for_each_possible_cpu(cpu) { 126 + struct xe_gt_stats *s = per_cpu_ptr(gt->stats, cpu); 127 + 128 + memset(s, 0, sizeof(*s)); 129 + } 142 130 }
+6
drivers/gpu/drm/xe/xe_gt_stats.h
··· 14 14 struct drm_printer; 15 15 16 16 #ifdef CONFIG_DEBUG_FS 17 + int xe_gt_stats_init(struct xe_gt *gt); 17 18 int xe_gt_stats_print_info(struct xe_gt *gt, struct drm_printer *p); 18 19 void xe_gt_stats_clear(struct xe_gt *gt); 19 20 void xe_gt_stats_incr(struct xe_gt *gt, const enum xe_gt_stats_id id, int incr); 20 21 #else 22 + static inline int xe_gt_stats_init(struct xe_gt *gt) 23 + { 24 + return 0; 25 + } 26 + 21 27 static inline void 22 28 xe_gt_stats_incr(struct xe_gt *gt, const enum xe_gt_stats_id id, 23 29 int incr)
+20
drivers/gpu/drm/xe/xe_gt_stats_types.h
··· 6 6 #ifndef _XE_GT_STATS_TYPES_H_ 7 7 #define _XE_GT_STATS_TYPES_H_ 8 8 9 + #include <linux/types.h> 10 + 9 11 enum xe_gt_stats_id { 10 12 XE_GT_STATS_ID_SVM_PAGEFAULT_COUNT, 11 13 XE_GT_STATS_ID_TLB_INVAL, ··· 15 13 XE_GT_STATS_ID_SVM_TLB_INVAL_US, 16 14 XE_GT_STATS_ID_VMA_PAGEFAULT_COUNT, 17 15 XE_GT_STATS_ID_VMA_PAGEFAULT_KB, 16 + XE_GT_STATS_ID_INVALID_PREFETCH_PAGEFAULT_COUNT, 18 17 XE_GT_STATS_ID_SVM_4K_PAGEFAULT_COUNT, 19 18 XE_GT_STATS_ID_SVM_64K_PAGEFAULT_COUNT, 20 19 XE_GT_STATS_ID_SVM_2M_PAGEFAULT_COUNT, ··· 60 57 /* must be the last entry */ 61 58 __XE_GT_STATS_NUM_IDS, 62 59 }; 60 + 61 + /** 62 + * struct xe_gt_stats - Per-CPU GT statistics counters 63 + * @counters: Array of 64-bit counters indexed by &enum xe_gt_stats_id 64 + * 65 + * This structure is used for high-frequency, per-CPU statistics collection 66 + * in the Xe driver. By using a per-CPU allocation and ensuring the structure 67 + * is cache-line aligned, we avoid the performance-heavy atomics and cache 68 + * coherency traffic. 69 + * 70 + * Updates to these counters should be performed using the this_cpu_add() 71 + * macro to ensure they are atomic with respect to local interrupts and 72 + * preemption-safe without the overhead of explicit locking. 73 + */ 74 + struct xe_gt_stats { 75 + u64 counters[__XE_GT_STATS_NUM_IDS]; 76 + } ____cacheline_aligned; 63 77 64 78 #endif
+9 -30
drivers/gpu/drm/xe/xe_gt_topology.c
··· 205 205 } 206 206 } 207 207 208 - static void 209 - get_num_dss_regs(struct xe_device *xe, int *geometry_regs, int *compute_regs) 210 - { 211 - if (GRAPHICS_VER(xe) > 20) { 212 - *geometry_regs = 3; 213 - *compute_regs = 3; 214 - } else if (GRAPHICS_VERx100(xe) == 1260) { 215 - *geometry_regs = 0; 216 - *compute_regs = 2; 217 - } else if (GRAPHICS_VERx100(xe) >= 1250) { 218 - *geometry_regs = 1; 219 - *compute_regs = 1; 220 - } else { 221 - *geometry_regs = 1; 222 - *compute_regs = 0; 223 - } 224 - } 225 - 226 208 void 227 209 xe_gt_topology_init(struct xe_gt *gt) 228 210 { ··· 212 230 XELP_GT_GEOMETRY_DSS_ENABLE, 213 231 XE2_GT_GEOMETRY_DSS_1, 214 232 XE2_GT_GEOMETRY_DSS_2, 233 + XE3P_XPC_GT_GEOMETRY_DSS_3, 215 234 }; 216 235 static const struct xe_reg compute_regs[] = { 217 236 XEHP_GT_COMPUTE_DSS_ENABLE, 218 237 XEHPC_GT_COMPUTE_DSS_ENABLE_EXT, 219 238 XE2_GT_COMPUTE_DSS_2, 239 + XE3P_XPC_GT_COMPUTE_DSS_3, 220 240 }; 221 - int num_geometry_regs, num_compute_regs; 222 - struct xe_device *xe = gt_to_xe(gt); 223 241 struct drm_printer p; 224 - 225 - get_num_dss_regs(xe, &num_geometry_regs, &num_compute_regs); 226 242 227 243 /* 228 244 * Register counts returned shouldn't exceed the number of registers 229 245 * passed as parameters below. 230 246 */ 231 - xe_gt_assert(gt, num_geometry_regs <= ARRAY_SIZE(geometry_regs)); 232 - xe_gt_assert(gt, num_compute_regs <= ARRAY_SIZE(compute_regs)); 247 + xe_gt_assert(gt, gt->info.num_geometry_xecore_fuse_regs <= ARRAY_SIZE(geometry_regs)); 248 + xe_gt_assert(gt, gt->info.num_compute_xecore_fuse_regs <= ARRAY_SIZE(compute_regs)); 233 249 234 250 load_dss_mask(gt, gt->fuse_topo.g_dss_mask, 235 - num_geometry_regs, geometry_regs); 251 + gt->info.num_geometry_xecore_fuse_regs, geometry_regs); 236 252 load_dss_mask(gt, gt->fuse_topo.c_dss_mask, 237 - num_compute_regs, compute_regs); 253 + gt->info.num_compute_xecore_fuse_regs, compute_regs); 238 254 239 255 load_eu_mask(gt, gt->fuse_topo.eu_mask_per_dss, &gt->fuse_topo.eu_type); 240 256 load_l3_bank_mask(gt, gt->fuse_topo.l3_bank_mask); ··· 310 330 */ 311 331 bool xe_gt_topology_has_dss_in_quadrant(struct xe_gt *gt, int quad) 312 332 { 313 - struct xe_device *xe = gt_to_xe(gt); 314 333 xe_dss_mask_t all_dss; 315 - int g_dss_regs, c_dss_regs, dss_per_quad, quad_first; 334 + int dss_per_quad, quad_first; 316 335 317 336 bitmap_or(all_dss, gt->fuse_topo.g_dss_mask, gt->fuse_topo.c_dss_mask, 318 337 XE_MAX_DSS_FUSE_BITS); 319 338 320 - get_num_dss_regs(xe, &g_dss_regs, &c_dss_regs); 321 - dss_per_quad = 32 * max(g_dss_regs, c_dss_regs) / 4; 339 + dss_per_quad = 32 * max(gt->info.num_geometry_xecore_fuse_regs, 340 + gt->info.num_compute_xecore_fuse_regs) / 4; 322 341 323 342 quad_first = xe_dss_mask_group_ffs(all_dss, dss_per_quad, quad); 324 343
+12 -10
drivers/gpu/drm/xe/xe_gt_types.h
··· 35 35 XE_GT_EU_TYPE_SIMD16, 36 36 }; 37 37 38 - #define XE_MAX_DSS_FUSE_REGS 3 38 + #define XE_MAX_DSS_FUSE_REGS 4 39 39 #define XE_MAX_DSS_FUSE_BITS (32 * XE_MAX_DSS_FUSE_REGS) 40 40 #define XE_MAX_EU_FUSE_REGS 1 41 41 #define XE_MAX_EU_FUSE_BITS (32 * XE_MAX_EU_FUSE_REGS) ··· 44 44 typedef unsigned long xe_dss_mask_t[BITS_TO_LONGS(XE_MAX_DSS_FUSE_BITS)]; 45 45 typedef unsigned long xe_eu_mask_t[BITS_TO_LONGS(XE_MAX_EU_FUSE_BITS)]; 46 46 typedef unsigned long xe_l3_bank_mask_t[BITS_TO_LONGS(XE_MAX_L3_BANK_MASK_BITS)]; 47 - 48 - struct xe_mmio_range { 49 - u32 start; 50 - u32 end; 51 - }; 52 47 53 48 /* 54 49 * The hardware has multiple kinds of multicast register ranges that need ··· 144 149 u8 id; 145 150 /** @info.has_indirect_ring_state: GT has indirect ring state support */ 146 151 u8 has_indirect_ring_state:1; 152 + /** 153 + * @info.num_geometry_xecore_fuse_regs: Number of 32b-bit fuse 154 + * registers the geometry XeCore mask spans. 155 + */ 156 + u8 num_geometry_xecore_fuse_regs; 157 + /** 158 + * @info.num_compute_xecore_fuse_regs: Number of 32b-bit fuse 159 + * registers the compute XeCore mask spans. 160 + */ 161 + u8 num_compute_xecore_fuse_regs; 147 162 } info; 148 163 149 164 #if IS_ENABLED(CONFIG_DEBUG_FS) 150 165 /** @stats: GT stats */ 151 - struct { 152 - /** @stats.counters: counters for various GT stats */ 153 - atomic64_t counters[__XE_GT_STATS_NUM_IDS]; 154 - } stats; 166 + struct xe_gt_stats __percpu *stats; 155 167 #endif 156 168 157 169 /**
+73 -10
drivers/gpu/drm/xe/xe_guc.c
··· 35 35 #include "xe_guc_klv_helpers.h" 36 36 #include "xe_guc_log.h" 37 37 #include "xe_guc_pc.h" 38 + #include "xe_guc_rc.h" 38 39 #include "xe_guc_relay.h" 39 40 #include "xe_guc_submit.h" 40 41 #include "xe_memirq.h" 41 42 #include "xe_mmio.h" 42 43 #include "xe_platform_types.h" 44 + #include "xe_sleep.h" 43 45 #include "xe_sriov.h" 44 46 #include "xe_sriov_pf_migration.h" 45 47 #include "xe_uc.h" ··· 212 210 if (XE_GT_WA(gt, 18020744125) && 213 211 !xe_hw_engine_mask_per_class(gt, XE_ENGINE_CLASS_RENDER)) 214 212 flags |= GUC_WA_RCS_REGS_IN_CCS_REGS_LIST; 215 - 216 - if (XE_GT_WA(gt, 1509372804)) 217 - flags |= GUC_WA_RENDER_RST_RC6_EXIT; 218 213 219 214 if (XE_GT_WA(gt, 14018913170)) 220 215 flags |= GUC_WA_ENABLE_TSC_CHECK_ON_RC6; ··· 667 668 guc_g2g_fini(guc); 668 669 } 669 670 671 + static void vf_guc_fini_hw(void *arg) 672 + { 673 + struct xe_guc *guc = arg; 674 + 675 + xe_gt_sriov_vf_reset(guc_to_gt(guc)); 676 + } 677 + 670 678 /** 671 679 * xe_guc_comm_init_early - early initialization of GuC communication 672 680 * @guc: the &xe_guc to initialize ··· 778 772 xe->info.has_page_reclaim_hw_assist = false; 779 773 780 774 if (IS_SRIOV_VF(xe)) { 775 + ret = devm_add_action_or_reset(xe->drm.dev, vf_guc_fini_hw, guc); 776 + if (ret) 777 + goto out; 778 + 781 779 ret = xe_guc_ct_init(&guc->ct); 782 780 if (ret) 783 781 goto out; ··· 879 869 if (ret) 880 870 return ret; 881 871 872 + ret = xe_guc_rc_init(guc); 873 + if (ret) 874 + return ret; 875 + 882 876 ret = xe_guc_engine_activity_init(guc); 883 877 if (ret) 884 878 return ret; ··· 914 900 return xe_guc_submit_enable(guc); 915 901 } 916 902 903 + /* 904 + * Wa_14025883347: Prevent GuC firmware DMA failures during GuC-only reset by ensuring 905 + * SRAM save/restore operations are complete before reset. 906 + */ 907 + static void guc_prevent_fw_dma_failure_on_reset(struct xe_guc *guc) 908 + { 909 + struct xe_gt *gt = guc_to_gt(guc); 910 + u32 boot_hash_chk, guc_status, sram_status; 911 + int ret; 912 + 913 + guc_status = xe_mmio_read32(&gt->mmio, GUC_STATUS); 914 + if (guc_status & GS_MIA_IN_RESET) 915 + return; 916 + 917 + boot_hash_chk = xe_mmio_read32(&gt->mmio, BOOT_HASH_CHK); 918 + if (!(boot_hash_chk & GUC_BOOT_UKERNEL_VALID)) 919 + return; 920 + 921 + /* Disable idle flow during reset (GuC reset re-enables it automatically) */ 922 + xe_mmio_rmw32(&gt->mmio, GUC_MAX_IDLE_COUNT, 0, GUC_IDLE_FLOW_DISABLE); 923 + 924 + ret = xe_mmio_wait32(&gt->mmio, GUC_STATUS, GS_UKERNEL_MASK, 925 + FIELD_PREP(GS_UKERNEL_MASK, XE_GUC_LOAD_STATUS_READY), 926 + 100000, &guc_status, false); 927 + if (ret) 928 + xe_gt_warn(gt, "GuC not ready after disabling idle flow (GUC_STATUS: 0x%x)\n", 929 + guc_status); 930 + 931 + ret = xe_mmio_wait32(&gt->mmio, GUC_SRAM_STATUS, GUC_SRAM_HANDLING_MASK, 932 + 0, 5000, &sram_status, false); 933 + if (ret) 934 + xe_gt_warn(gt, "SRAM handling not complete (GUC_SRAM_STATUS: 0x%x)\n", 935 + sram_status); 936 + } 937 + 917 938 int xe_guc_reset(struct xe_guc *guc) 918 939 { 919 940 struct xe_gt *gt = guc_to_gt(guc); ··· 960 911 961 912 if (IS_SRIOV_VF(gt_to_xe(gt))) 962 913 return xe_gt_sriov_vf_bootstrap(gt); 914 + 915 + if (XE_GT_WA(gt, 14025883347)) 916 + guc_prevent_fw_dma_failure_on_reset(guc); 963 917 964 918 xe_mmio_write32(mmio, GDRST, GRDOM_GUC); 965 919 ··· 1440 1388 return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action)); 1441 1389 } 1442 1390 1391 + #define MAX_RETRIES_ON_FLR 2 1392 + #define MIN_SLEEP_MS_ON_FLR 256 1393 + 1443 1394 int xe_guc_mmio_send_recv(struct xe_guc *guc, const u32 *request, 1444 1395 u32 len, u32 *response_buf) 1445 1396 { 1446 1397 struct xe_device *xe = guc_to_xe(guc); 1447 1398 struct xe_gt *gt = guc_to_gt(guc); 1448 1399 struct xe_mmio *mmio = &gt->mmio; 1449 - u32 header, reply; 1450 1400 struct xe_reg reply_reg = xe_gt_is_media_type(gt) ? 1451 1401 MED_VF_SW_FLAG(0) : VF_SW_FLAG(0); 1452 1402 const u32 LAST_INDEX = VF_SW_FLAG_COUNT - 1; 1453 - bool lost = false; 1403 + unsigned int sleep_period_ms = 1; 1404 + unsigned int lost = 0; 1405 + u32 header; 1454 1406 int ret; 1455 1407 int i; 1456 1408 ··· 1486 1430 1487 1431 ret = xe_mmio_wait32(mmio, reply_reg, GUC_HXG_MSG_0_ORIGIN, 1488 1432 FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_GUC), 1489 - 50000, &reply, false); 1433 + 50000, &header, false); 1490 1434 if (ret) { 1491 1435 /* scratch registers might be cleared during FLR, try once more */ 1492 - if (!reply && !lost) { 1436 + if (!header) { 1437 + if (++lost > MAX_RETRIES_ON_FLR) { 1438 + xe_gt_err(gt, "GuC mmio request %#x: lost, too many retries %u\n", 1439 + request[0], lost); 1440 + return -ENOLINK; 1441 + } 1493 1442 xe_gt_dbg(gt, "GuC mmio request %#x: lost, trying again\n", request[0]); 1494 - lost = true; 1443 + xe_sleep_relaxed_ms(MIN_SLEEP_MS_ON_FLR); 1495 1444 goto retry; 1496 1445 } 1497 1446 timeout: 1498 1447 xe_gt_err(gt, "GuC mmio request %#x: no reply %#x\n", 1499 - request[0], reply); 1448 + request[0], header); 1500 1449 return ret; 1501 1450 } 1502 1451 1503 - header = xe_mmio_read32(mmio, reply_reg); 1504 1452 if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) == 1505 1453 GUC_HXG_TYPE_NO_RESPONSE_BUSY) { 1506 1454 /* ··· 1540 1480 1541 1481 xe_gt_dbg(gt, "GuC mmio request %#x: retrying, reason %#x\n", 1542 1482 request[0], reason); 1483 + 1484 + xe_sleep_exponential_ms(&sleep_period_ms, 256); 1543 1485 goto retry; 1544 1486 } 1545 1487 ··· 1671 1609 if (!IS_SRIOV_VF(guc_to_xe(guc))) { 1672 1610 int err; 1673 1611 1612 + xe_guc_rc_disable(guc); 1674 1613 err = xe_guc_pc_stop(&guc->pc); 1675 1614 xe_gt_WARN(guc_to_gt(guc), err, "Failed to stop GuC PC: %pe\n", 1676 1615 ERR_PTR(err));
+75 -44
drivers/gpu/drm/xe/xe_guc_ct.c
··· 32 32 #include "xe_guc_tlb_inval.h" 33 33 #include "xe_map.h" 34 34 #include "xe_pm.h" 35 + #include "xe_sleep.h" 35 36 #include "xe_sriov_vf.h" 36 37 #include "xe_trace_guc.h" 37 38 ··· 255 254 256 255 #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) 257 256 #define CTB_H2G_BUFFER_OFFSET (CTB_DESC_SIZE * 2) 257 + #define CTB_G2H_BUFFER_OFFSET (CTB_DESC_SIZE * 2) 258 258 #define CTB_H2G_BUFFER_SIZE (SZ_4K) 259 259 #define CTB_H2G_BUFFER_DWORDS (CTB_H2G_BUFFER_SIZE / sizeof(u32)) 260 260 #define CTB_G2H_BUFFER_SIZE (SZ_128K) ··· 276 274 */ 277 275 long xe_guc_ct_queue_proc_time_jiffies(struct xe_guc_ct *ct) 278 276 { 279 - BUILD_BUG_ON(!IS_ALIGNED(CTB_H2G_BUFFER_SIZE, SZ_4)); 277 + BUILD_BUG_ON(!IS_ALIGNED(CTB_H2G_BUFFER_SIZE, SZ_4K)); 280 278 return (CTB_H2G_BUFFER_SIZE / SZ_4K) * HZ; 281 279 } 282 280 283 - static size_t guc_ct_size(void) 281 + static size_t guc_h2g_size(void) 284 282 { 285 - return CTB_H2G_BUFFER_OFFSET + CTB_H2G_BUFFER_SIZE + 286 - CTB_G2H_BUFFER_SIZE; 283 + return CTB_H2G_BUFFER_OFFSET + CTB_H2G_BUFFER_SIZE; 284 + } 285 + 286 + static size_t guc_g2h_size(void) 287 + { 288 + return CTB_G2H_BUFFER_OFFSET + CTB_G2H_BUFFER_SIZE; 287 289 } 288 290 289 291 static void guc_ct_fini(struct drm_device *drm, void *arg) ··· 316 310 struct xe_gt *gt = ct_to_gt(ct); 317 311 int err; 318 312 319 - xe_gt_assert(gt, !(guc_ct_size() % PAGE_SIZE)); 313 + xe_gt_assert(gt, !(guc_h2g_size() % PAGE_SIZE)); 314 + xe_gt_assert(gt, !(guc_g2h_size() % PAGE_SIZE)); 320 315 321 316 err = drmm_mutex_init(&xe->drm, &ct->lock); 322 317 if (err) ··· 362 355 struct xe_tile *tile = gt_to_tile(gt); 363 356 struct xe_bo *bo; 364 357 365 - bo = xe_managed_bo_create_pin_map(xe, tile, guc_ct_size(), 358 + bo = xe_managed_bo_create_pin_map(xe, tile, guc_h2g_size(), 366 359 XE_BO_FLAG_SYSTEM | 367 360 XE_BO_FLAG_GGTT | 368 361 XE_BO_FLAG_GGTT_INVALIDATE | ··· 370 363 if (IS_ERR(bo)) 371 364 return PTR_ERR(bo); 372 365 373 - ct->bo = bo; 366 + ct->ctbs.h2g.bo = bo; 367 + 368 + bo = xe_managed_bo_create_pin_map(xe, tile, guc_g2h_size(), 369 + XE_BO_FLAG_SYSTEM | 370 + XE_BO_FLAG_GGTT | 371 + XE_BO_FLAG_GGTT_INVALIDATE | 372 + XE_BO_FLAG_PINNED_NORESTORE); 373 + if (IS_ERR(bo)) 374 + return PTR_ERR(bo); 375 + 376 + ct->ctbs.g2h.bo = bo; 374 377 375 378 return devm_add_action_or_reset(xe->drm.dev, guc_action_disable_ct, ct); 376 379 } ··· 405 388 xe_assert(xe, !xe_guc_ct_enabled(ct)); 406 389 407 390 if (IS_DGFX(xe)) { 408 - ret = xe_managed_bo_reinit_in_vram(xe, tile, &ct->bo); 391 + ret = xe_managed_bo_reinit_in_vram(xe, tile, &ct->ctbs.h2g.bo); 409 392 if (ret) 410 393 return ret; 411 394 } ··· 455 438 g2h->desc = IOSYS_MAP_INIT_OFFSET(map, CTB_DESC_SIZE); 456 439 xe_map_memset(xe, &g2h->desc, 0, 0, sizeof(struct guc_ct_buffer_desc)); 457 440 458 - g2h->cmds = IOSYS_MAP_INIT_OFFSET(map, CTB_H2G_BUFFER_OFFSET + 459 - CTB_H2G_BUFFER_SIZE); 441 + g2h->cmds = IOSYS_MAP_INIT_OFFSET(map, CTB_G2H_BUFFER_OFFSET); 460 442 } 461 443 462 444 static int guc_ct_ctb_h2g_register(struct xe_guc_ct *ct) ··· 464 448 u32 desc_addr, ctb_addr, size; 465 449 int err; 466 450 467 - desc_addr = xe_bo_ggtt_addr(ct->bo); 468 - ctb_addr = xe_bo_ggtt_addr(ct->bo) + CTB_H2G_BUFFER_OFFSET; 451 + desc_addr = xe_bo_ggtt_addr(ct->ctbs.h2g.bo); 452 + ctb_addr = xe_bo_ggtt_addr(ct->ctbs.h2g.bo) + CTB_H2G_BUFFER_OFFSET; 469 453 size = ct->ctbs.h2g.info.size * sizeof(u32); 470 454 471 455 err = xe_guc_self_cfg64(guc, ··· 491 475 u32 desc_addr, ctb_addr, size; 492 476 int err; 493 477 494 - desc_addr = xe_bo_ggtt_addr(ct->bo) + CTB_DESC_SIZE; 495 - ctb_addr = xe_bo_ggtt_addr(ct->bo) + CTB_H2G_BUFFER_OFFSET + 496 - CTB_H2G_BUFFER_SIZE; 478 + desc_addr = xe_bo_ggtt_addr(ct->ctbs.g2h.bo) + CTB_DESC_SIZE; 479 + ctb_addr = xe_bo_ggtt_addr(ct->ctbs.g2h.bo) + CTB_G2H_BUFFER_OFFSET; 497 480 size = ct->ctbs.g2h.info.size * sizeof(u32); 498 481 499 482 err = xe_guc_self_cfg64(guc, ··· 619 604 xe_gt_assert(gt, !xe_guc_ct_enabled(ct)); 620 605 621 606 if (needs_register) { 622 - xe_map_memset(xe, &ct->bo->vmap, 0, 0, xe_bo_size(ct->bo)); 623 - guc_ct_ctb_h2g_init(xe, &ct->ctbs.h2g, &ct->bo->vmap); 624 - guc_ct_ctb_g2h_init(xe, &ct->ctbs.g2h, &ct->bo->vmap); 607 + xe_map_memset(xe, &ct->ctbs.h2g.bo->vmap, 0, 0, 608 + xe_bo_size(ct->ctbs.h2g.bo)); 609 + xe_map_memset(xe, &ct->ctbs.g2h.bo->vmap, 0, 0, 610 + xe_bo_size(ct->ctbs.g2h.bo)); 611 + guc_ct_ctb_h2g_init(xe, &ct->ctbs.h2g, &ct->ctbs.h2g.bo->vmap); 612 + guc_ct_ctb_g2h_init(xe, &ct->ctbs.g2h, &ct->ctbs.g2h.bo->vmap); 625 613 626 614 err = guc_ct_ctb_h2g_register(ct); 627 615 if (err) ··· 641 623 ct->ctbs.h2g.info.broken = false; 642 624 ct->ctbs.g2h.info.broken = false; 643 625 /* Skip everything in H2G buffer */ 644 - xe_map_memset(xe, &ct->bo->vmap, CTB_H2G_BUFFER_OFFSET, 0, 626 + xe_map_memset(xe, &ct->ctbs.h2g.bo->vmap, CTB_H2G_BUFFER_OFFSET, 0, 645 627 CTB_H2G_BUFFER_SIZE); 646 628 } 647 629 ··· 661 643 spin_lock_irq(&ct->dead.lock); 662 644 if (ct->dead.reason) { 663 645 ct->dead.reason |= (1 << CT_DEAD_STATE_REARM); 664 - queue_work(system_unbound_wq, &ct->dead.worker); 646 + queue_work(system_dfl_wq, &ct->dead.worker); 665 647 } 666 648 spin_unlock_irq(&ct->dead.lock); 667 649 #endif ··· 939 921 u32 full_len; 940 922 struct iosys_map map = IOSYS_MAP_INIT_OFFSET(&h2g->cmds, 941 923 tail * sizeof(u32)); 942 - u32 desc_status; 943 924 944 925 full_len = len + GUC_CTB_HDR_LEN; 945 926 946 927 lockdep_assert_held(&ct->lock); 947 928 xe_gt_assert(gt, full_len <= GUC_CTB_MSG_MAX_LEN); 948 929 949 - desc_status = desc_read(xe, h2g, status); 950 - if (desc_status) { 951 - xe_gt_err(gt, "CT write: non-zero status: %u\n", desc_status); 952 - goto corrupted; 953 - } 954 - 955 930 if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) { 956 931 u32 desc_tail = desc_read(xe, h2g, tail); 957 932 u32 desc_head = desc_read(xe, h2g, head); 933 + u32 desc_status; 934 + 935 + desc_status = desc_read(xe, h2g, status); 936 + if (desc_status) { 937 + xe_gt_err(gt, "CT write: non-zero status: %u\n", desc_status); 938 + goto corrupted; 939 + } 958 940 959 941 if (tail != desc_tail) { 960 942 desc_write(xe, h2g, status, desc_status | GUC_CTB_STATUS_MISMATCH); ··· 1023 1005 /* Update descriptor */ 1024 1006 desc_write(xe, h2g, tail, h2g->info.tail); 1025 1007 1026 - trace_xe_guc_ctb_h2g(xe, gt->info.id, *(action - 1), full_len, 1027 - desc_read(xe, h2g, head), h2g->info.tail); 1008 + /* 1009 + * desc_read() performs an VRAM read which serializes the CPU and drains 1010 + * posted writes on dGPU platforms. Tracepoints evaluate arguments even 1011 + * when disabled, so guard the event to avoid adding µs-scale latency to 1012 + * the fast H2G submission path when tracing is not active. 1013 + */ 1014 + if (trace_xe_guc_ctb_h2g_enabled()) 1015 + trace_xe_guc_ctb_h2g(xe, gt->info.id, *(action - 1), full_len, 1016 + desc_read(xe, h2g, head), h2g->info.tail); 1028 1017 1029 1018 return 0; 1030 1019 ··· 1126 1101 */ 1127 1102 static bool guc_ct_send_wait_for_retry(struct xe_guc_ct *ct, u32 len, 1128 1103 u32 g2h_len, struct g2h_fence *g2h_fence, 1129 - unsigned int *sleep_period_ms) 1104 + unsigned int *sleep_period_ms, 1105 + unsigned int *sleep_total_ms) 1130 1106 { 1131 1107 struct xe_device *xe = ct_to_xe(ct); 1132 1108 ··· 1141 1115 if (!h2g_has_room(ct, len + GUC_CTB_HDR_LEN)) { 1142 1116 struct guc_ctb *h2g = &ct->ctbs.h2g; 1143 1117 1144 - if (*sleep_period_ms == 1024) 1118 + if (*sleep_total_ms > 1000) 1145 1119 return false; 1146 1120 1147 1121 trace_xe_guc_ct_h2g_flow_control(xe, h2g->info.head, h2g->info.tail, 1148 1122 h2g->info.size, 1149 1123 h2g->info.space, 1150 1124 len + GUC_CTB_HDR_LEN); 1151 - msleep(*sleep_period_ms); 1152 - *sleep_period_ms <<= 1; 1125 + *sleep_total_ms += xe_sleep_exponential_ms(sleep_period_ms, 64); 1153 1126 } else { 1154 - struct xe_device *xe = ct_to_xe(ct); 1155 1127 struct guc_ctb *g2h = &ct->ctbs.g2h; 1156 1128 int ret; 1157 1129 ··· 1171 1147 ret = dequeue_one_g2h(ct); 1172 1148 if (ret < 0) { 1173 1149 if (ret != -ECANCELED) 1174 - xe_gt_err(ct_to_gt(ct), "CTB receive failed (%pe)", 1150 + xe_gt_err(ct_to_gt(ct), "CTB receive failed (%pe)\n", 1175 1151 ERR_PTR(ret)); 1176 1152 return false; 1177 1153 } ··· 1185 1161 { 1186 1162 struct xe_gt *gt = ct_to_gt(ct); 1187 1163 unsigned int sleep_period_ms = 1; 1164 + unsigned int sleep_total_ms = 0; 1188 1165 int ret; 1189 1166 1190 1167 xe_gt_assert(gt, !g2h_len || !g2h_fence); ··· 1198 1173 1199 1174 if (unlikely(ret == -EBUSY)) { 1200 1175 if (!guc_ct_send_wait_for_retry(ct, len, g2h_len, g2h_fence, 1201 - &sleep_period_ms)) 1176 + &sleep_period_ms, &sleep_total_ms)) 1202 1177 goto broken; 1203 1178 goto try_again; 1204 1179 } ··· 1347 1322 */ 1348 1323 mutex_lock(&ct->lock); 1349 1324 if (!ret) { 1350 - xe_gt_err(gt, "Timed out wait for G2H, fence %u, action %04x, done %s", 1325 + xe_gt_err(gt, "Timed out wait for G2H, fence %u, action %04x, done %s\n", 1351 1326 g2h_fence.seqno, action[0], str_yes_no(g2h_fence.done)); 1352 1327 xa_erase(&ct->fence_lookup, g2h_fence.seqno); 1353 1328 mutex_unlock(&ct->lock); ··· 1857 1832 ret = xe_guc_tlb_inval_done_handler(guc, payload, adj_len); 1858 1833 break; 1859 1834 default: 1860 - xe_gt_warn(gt, "NOT_POSSIBLE"); 1835 + xe_gt_warn(gt, "NOT_POSSIBLE\n"); 1861 1836 } 1862 1837 1863 1838 if (ret) { ··· 1960 1935 mutex_unlock(&ct->lock); 1961 1936 1962 1937 if (unlikely(ret == -EPROTO || ret == -EOPNOTSUPP)) { 1963 - xe_gt_err(ct_to_gt(ct), "CT dequeue failed: %d", ret); 1938 + xe_gt_err(ct_to_gt(ct), "CT dequeue failed: %d\n", ret); 1964 1939 CT_DEAD(ct, NULL, G2H_RECV); 1965 1940 kick_reset(ct); 1966 1941 } ··· 1986 1961 if (!snapshot) 1987 1962 return NULL; 1988 1963 1989 - if (ct->bo && want_ctb) { 1990 - snapshot->ctb_size = xe_bo_size(ct->bo); 1964 + if (ct->ctbs.h2g.bo && ct->ctbs.g2h.bo && want_ctb) { 1965 + snapshot->ctb_size = xe_bo_size(ct->ctbs.h2g.bo) + 1966 + xe_bo_size(ct->ctbs.g2h.bo); 1991 1967 snapshot->ctb = kmalloc(snapshot->ctb_size, atomic ? GFP_ATOMIC : GFP_KERNEL); 1992 1968 } 1993 1969 ··· 2036 2010 guc_ctb_snapshot_capture(xe, &ct->ctbs.g2h, &snapshot->g2h); 2037 2011 } 2038 2012 2039 - if (ct->bo && snapshot->ctb) 2040 - xe_map_memcpy_from(xe, snapshot->ctb, &ct->bo->vmap, 0, snapshot->ctb_size); 2013 + if (ct->ctbs.h2g.bo && ct->ctbs.g2h.bo && snapshot->ctb) { 2014 + xe_map_memcpy_from(xe, snapshot->ctb, &ct->ctbs.h2g.bo->vmap, 0, 2015 + xe_bo_size(ct->ctbs.h2g.bo)); 2016 + xe_map_memcpy_from(xe, snapshot->ctb + xe_bo_size(ct->ctbs.h2g.bo), 2017 + &ct->ctbs.g2h.bo->vmap, 0, 2018 + xe_bo_size(ct->ctbs.g2h.bo)); 2019 + } 2041 2020 2042 2021 return snapshot; 2043 2022 } ··· 2196 2165 2197 2166 spin_unlock_irqrestore(&ct->dead.lock, flags); 2198 2167 2199 - queue_work(system_unbound_wq, &(ct)->dead.worker); 2168 + queue_work(system_dfl_wq, &(ct)->dead.worker); 2200 2169 } 2201 2170 2202 2171 static void ct_dead_print(struct xe_dead_ct *dead)
+2 -2
drivers/gpu/drm/xe/xe_guc_ct_types.h
··· 39 39 * struct guc_ctb - GuC command transport buffer (CTB) 40 40 */ 41 41 struct guc_ctb { 42 + /** @bo: Xe BO for CTB */ 43 + struct xe_bo *bo; 42 44 /** @desc: dma buffer map for CTB descriptor */ 43 45 struct iosys_map desc; 44 46 /** @cmds: dma buffer map for CTB commands */ ··· 128 126 * for the H2G and G2H requests sent and received through the buffers. 129 127 */ 130 128 struct xe_guc_ct { 131 - /** @bo: Xe BO for CT */ 132 - struct xe_bo *bo; 133 129 /** @lock: protects everything in CT layer */ 134 130 struct mutex lock; 135 131 /** @fast_lock: protects G2H channel and credits */
+3 -2
drivers/gpu/drm/xe/xe_guc_fwif.h
··· 261 261 #define PFD_ACCESS_TYPE GENMASK(1, 0) 262 262 #define PFD_FAULT_TYPE GENMASK(3, 2) 263 263 #define PFD_VFID GENMASK(9, 4) 264 - #define PFD_RSVD_1 GENMASK(11, 10) 264 + #define PFD_RSVD_1 BIT(10) 265 + #define PFD_PREFETCH BIT(11) /* Only valid on Xe3+, reserved on prior platforms */ 265 266 #define PFD_VIRTUAL_ADDR_LO GENMASK(31, 12) 266 267 #define PFD_VIRTUAL_ADDR_LO_SHIFT 12 267 268 ··· 282 281 283 282 u32 dw1; 284 283 #define PFR_VFID GENMASK(5, 0) 285 - #define PFR_RSVD_1 BIT(6) 284 + #define PFR_PREFETCH BIT(6) /* Only valid on Xe3+, reserved on prior platforms */ 286 285 #define PFR_ENG_INSTANCE GENMASK(12, 7) 287 286 #define PFR_ENG_CLASS GENMASK(15, 13) 288 287 #define PFR_PDATA GENMASK(31, 16)
+5 -1
drivers/gpu/drm/xe/xe_guc_log.h
··· 13 13 struct xe_device; 14 14 15 15 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) 16 - #define XE_GUC_LOG_EVENT_DATA_BUFFER_SIZE SZ_8M 16 + #define XE_GUC_LOG_EVENT_DATA_BUFFER_SIZE SZ_16M 17 17 #define XE_GUC_LOG_CRASH_DUMP_BUFFER_SIZE SZ_1M 18 18 #define XE_GUC_LOG_STATE_CAPTURE_BUFFER_SIZE SZ_2M 19 + #elif IS_ENABLED(CONFIG_DRM_XE_DEBUG) 20 + #define XE_GUC_LOG_EVENT_DATA_BUFFER_SIZE SZ_8M 21 + #define XE_GUC_LOG_CRASH_DUMP_BUFFER_SIZE SZ_1M 22 + #define XE_GUC_LOG_STATE_CAPTURE_BUFFER_SIZE SZ_1M 19 23 #else 20 24 #define XE_GUC_LOG_EVENT_DATA_BUFFER_SIZE SZ_64K 21 25 #define XE_GUC_LOG_CRASH_DUMP_BUFFER_SIZE SZ_16K
+13 -5
drivers/gpu/drm/xe/xe_guc_pagefault.c
··· 8 8 #include "xe_guc_ct.h" 9 9 #include "xe_guc_pagefault.h" 10 10 #include "xe_pagefault.h" 11 + #include "xe_pagefault_types.h" 11 12 12 13 static void guc_ack_fault(struct xe_pagefault *pf, int err) 13 14 { 14 15 u32 vfid = FIELD_GET(PFD_VFID, pf->producer.msg[2]); 16 + u32 prefetch = FIELD_GET(PFD_PREFETCH, pf->producer.msg[2]); 15 17 u32 engine_instance = FIELD_GET(PFD_ENG_INSTANCE, pf->producer.msg[0]); 16 18 u32 engine_class = FIELD_GET(PFD_ENG_CLASS, pf->producer.msg[0]); 17 19 u32 pdata = FIELD_GET(PFD_PDATA_LO, pf->producer.msg[0]) | 18 20 (FIELD_GET(PFD_PDATA_HI, pf->producer.msg[1]) << 19 21 PFD_PDATA_HI_SHIFT); 22 + u32 asid = FIELD_GET(PFD_ASID, pf->producer.msg[1]); 20 23 u32 action[] = { 21 24 XE_GUC_ACTION_PAGE_FAULT_RES_DESC, 22 25 ··· 27 24 FIELD_PREP(PFR_SUCCESS, !!err) | 28 25 FIELD_PREP(PFR_REPLY, PFR_ACCESS) | 29 26 FIELD_PREP(PFR_DESC_TYPE, FAULT_RESPONSE_DESC) | 30 - FIELD_PREP(PFR_ASID, pf->consumer.asid), 27 + FIELD_PREP(PFR_ASID, asid), 31 28 32 29 FIELD_PREP(PFR_VFID, vfid) | 30 + FIELD_PREP(PFR_PREFETCH, err ? prefetch : 0) | 33 31 FIELD_PREP(PFR_ENG_INSTANCE, engine_instance) | 34 32 FIELD_PREP(PFR_ENG_CLASS, engine_class) | 35 33 FIELD_PREP(PFR_PDATA, pdata), ··· 79 75 (FIELD_GET(PFD_VIRTUAL_ADDR_LO, msg[2]) << 80 76 PFD_VIRTUAL_ADDR_LO_SHIFT); 81 77 pf.consumer.asid = FIELD_GET(PFD_ASID, msg[1]); 82 - pf.consumer.access_type = FIELD_GET(PFD_ACCESS_TYPE, msg[2]); 83 - pf.consumer.fault_type = FIELD_GET(PFD_FAULT_TYPE, msg[2]); 78 + pf.consumer.access_type = FIELD_GET(PFD_ACCESS_TYPE, msg[2]) | 79 + (FIELD_GET(PFD_PREFETCH, msg[2]) ? XE_PAGEFAULT_ACCESS_PREFETCH : 0); 84 80 if (FIELD_GET(XE2_PFD_TRVA_FAULT, msg[0])) 85 - pf.consumer.fault_level = XE_PAGEFAULT_LEVEL_NACK; 81 + pf.consumer.fault_type_level = XE_PAGEFAULT_TYPE_LEVEL_NACK; 86 82 else 87 - pf.consumer.fault_level = FIELD_GET(PFD_FAULT_LEVEL, msg[0]); 83 + pf.consumer.fault_type_level = 84 + FIELD_PREP(XE_PAGEFAULT_LEVEL_MASK, 85 + FIELD_GET(PFD_FAULT_LEVEL, msg[0])) | 86 + FIELD_PREP(XE_PAGEFAULT_TYPE_MASK, 87 + FIELD_GET(PFD_FAULT_TYPE, msg[2])); 88 88 pf.consumer.engine_class = FIELD_GET(PFD_ENG_CLASS, msg[0]); 89 89 pf.consumer.engine_instance = FIELD_GET(PFD_ENG_INSTANCE, msg[0]); 90 90
+40 -78
drivers/gpu/drm/xe/xe_guc_pc.c
··· 92 92 * Render-C states is also a GuC PC feature that is now enabled in Xe for 93 93 * all platforms. 94 94 * 95 + * Implementation details: 96 + * ----------------------- 97 + * The implementation for GuC Power Management features is split as follows: 98 + * 99 + * xe_guc_rc: Logic for handling GuC RC 100 + * xe_gt_idle: Host side logic for RC6 and Coarse Power gating (CPG) 101 + * xe_guc_pc: Logic for all other SLPC related features 102 + * 103 + * There is some cross interaction between these where host C6 will need to be 104 + * enabled when we plan to skip GuC RC. Also, the GuC RC mode is currently 105 + * overridden through 0x3003 which is an SLPC H2G call. 95 106 */ 96 107 97 108 static struct xe_guc *pc_to_guc(struct xe_guc_pc *pc) ··· 264 253 return ret; 265 254 } 266 255 267 - static int pc_action_setup_gucrc(struct xe_guc_pc *pc, u32 mode) 256 + /** 257 + * xe_guc_pc_action_set_param() - Set value of SLPC param 258 + * @pc: Xe_GuC_PC instance 259 + * @id: Param id 260 + * @value: Value to set 261 + * 262 + * This function can be used to set any SLPC param. 263 + * 264 + * Return: 0 on Success 265 + */ 266 + int xe_guc_pc_action_set_param(struct xe_guc_pc *pc, u8 id, u32 value) 268 267 { 269 - struct xe_guc_ct *ct = pc_to_ct(pc); 270 - u32 action[] = { 271 - GUC_ACTION_HOST2GUC_SETUP_PC_GUCRC, 272 - mode, 273 - }; 274 - int ret; 268 + xe_device_assert_mem_access(pc_to_xe(pc)); 269 + return pc_action_set_param(pc, id, value); 270 + } 275 271 276 - ret = xe_guc_ct_send(ct, action, ARRAY_SIZE(action), 0, 0); 277 - if (ret && !(xe_device_wedged(pc_to_xe(pc)) && ret == -ECANCELED)) 278 - xe_gt_err(pc_to_gt(pc), "GuC RC enable mode=%u failed: %pe\n", 279 - mode, ERR_PTR(ret)); 280 - return ret; 272 + /** 273 + * xe_guc_pc_action_unset_param() - Revert to default value 274 + * @pc: Xe_GuC_PC instance 275 + * @id: Param id 276 + * 277 + * This function can be used revert any SLPC param to its default value. 278 + * 279 + * Return: 0 on Success 280 + */ 281 + int xe_guc_pc_action_unset_param(struct xe_guc_pc *pc, u8 id) 282 + { 283 + xe_device_assert_mem_access(pc_to_xe(pc)); 284 + return pc_action_unset_param(pc, id); 281 285 } 282 286 283 287 static u32 decode_freq(u32 raw) ··· 1076 1050 return ret; 1077 1051 } 1078 1052 1079 - /** 1080 - * xe_guc_pc_gucrc_disable - Disable GuC RC 1081 - * @pc: Xe_GuC_PC instance 1082 - * 1083 - * Disables GuC RC by taking control of RC6 back from GuC. 1084 - * 1085 - * Return: 0 on success, negative error code on error. 1086 - */ 1087 - int xe_guc_pc_gucrc_disable(struct xe_guc_pc *pc) 1088 - { 1089 - struct xe_device *xe = pc_to_xe(pc); 1090 - struct xe_gt *gt = pc_to_gt(pc); 1091 - int ret = 0; 1092 - 1093 - if (xe->info.skip_guc_pc) 1094 - return 0; 1095 - 1096 - ret = pc_action_setup_gucrc(pc, GUCRC_HOST_CONTROL); 1097 - if (ret) 1098 - return ret; 1099 - 1100 - return xe_gt_idle_disable_c6(gt); 1101 - } 1102 - 1103 - /** 1104 - * xe_guc_pc_override_gucrc_mode - override GUCRC mode 1105 - * @pc: Xe_GuC_PC instance 1106 - * @mode: new value of the mode. 1107 - * 1108 - * Return: 0 on success, negative error code on error 1109 - */ 1110 - int xe_guc_pc_override_gucrc_mode(struct xe_guc_pc *pc, enum slpc_gucrc_mode mode) 1111 - { 1112 - guard(xe_pm_runtime)(pc_to_xe(pc)); 1113 - return pc_action_set_param(pc, SLPC_PARAM_PWRGATE_RC_MODE, mode); 1114 - } 1115 - 1116 - /** 1117 - * xe_guc_pc_unset_gucrc_mode - unset GUCRC mode override 1118 - * @pc: Xe_GuC_PC instance 1119 - * 1120 - * Return: 0 on success, negative error code on error 1121 - */ 1122 - int xe_guc_pc_unset_gucrc_mode(struct xe_guc_pc *pc) 1123 - { 1124 - guard(xe_pm_runtime)(pc_to_xe(pc)); 1125 - return pc_action_unset_param(pc, SLPC_PARAM_PWRGATE_RC_MODE); 1126 - } 1127 - 1128 1053 static void pc_init_pcode_freq(struct xe_guc_pc *pc) 1129 1054 { 1130 1055 u32 min = DIV_ROUND_CLOSEST(pc->rpn_freq, GT_FREQUENCY_MULTIPLIER); ··· 1224 1247 return -ETIMEDOUT; 1225 1248 1226 1249 if (xe->info.skip_guc_pc) { 1227 - if (xe->info.platform != XE_PVC) 1228 - xe_gt_idle_enable_c6(gt); 1229 - 1230 1250 /* Request max possible since dynamic freq mgmt is not enabled */ 1231 1251 pc_set_cur_freq(pc, UINT_MAX); 1232 1252 return 0; ··· 1265 1291 if (ret) 1266 1292 return ret; 1267 1293 1268 - if (xe->info.platform == XE_PVC) { 1269 - xe_guc_pc_gucrc_disable(pc); 1270 - return 0; 1271 - } 1272 - 1273 - ret = pc_action_setup_gucrc(pc, GUCRC_FIRMWARE_CONTROL); 1274 - if (ret) 1275 - return ret; 1276 - 1277 1294 /* Enable SLPC Optimized Strategy for compute */ 1278 1295 ret = pc_action_set_strategy(pc, SLPC_OPTIMIZED_STRATEGY_COMPUTE); 1279 1296 ··· 1284 1319 { 1285 1320 struct xe_device *xe = pc_to_xe(pc); 1286 1321 1287 - if (xe->info.skip_guc_pc) { 1288 - xe_gt_idle_disable_c6(pc_to_gt(pc)); 1322 + if (xe->info.skip_guc_pc) 1289 1323 return 0; 1290 - } 1291 1324 1292 1325 mutex_lock(&pc->freq_lock); 1293 1326 pc->freq_ready = false; ··· 1306 1343 if (xe_device_wedged(xe)) 1307 1344 return; 1308 1345 1309 - CLASS(xe_force_wake, fw_ref)(gt_to_fw(pc_to_gt(pc)), XE_FORCEWAKE_ALL); 1310 - xe_guc_pc_gucrc_disable(pc); 1346 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(pc_to_gt(pc)), XE_FW_GT); 1311 1347 XE_WARN_ON(xe_guc_pc_stop(pc)); 1312 1348 1313 1349 /* Bind requested freq to mert_freq_cap before unload */
+2 -4
drivers/gpu/drm/xe/xe_guc_pc.h
··· 9 9 #include <linux/types.h> 10 10 11 11 struct xe_guc_pc; 12 - enum slpc_gucrc_mode; 13 12 struct drm_printer; 14 13 15 14 int xe_guc_pc_init(struct xe_guc_pc *pc); 16 15 int xe_guc_pc_start(struct xe_guc_pc *pc); 17 16 int xe_guc_pc_stop(struct xe_guc_pc *pc); 18 - int xe_guc_pc_gucrc_disable(struct xe_guc_pc *pc); 19 - int xe_guc_pc_override_gucrc_mode(struct xe_guc_pc *pc, enum slpc_gucrc_mode mode); 20 - int xe_guc_pc_unset_gucrc_mode(struct xe_guc_pc *pc); 21 17 void xe_guc_pc_print(struct xe_guc_pc *pc, struct drm_printer *p); 18 + int xe_guc_pc_action_set_param(struct xe_guc_pc *pc, u8 id, u32 value); 19 + int xe_guc_pc_action_unset_param(struct xe_guc_pc *pc, u8 id); 22 20 23 21 u32 xe_guc_pc_get_act_freq(struct xe_guc_pc *pc); 24 22 int xe_guc_pc_get_cur_freq(struct xe_guc_pc *pc, u32 *freq);
+131
drivers/gpu/drm/xe/xe_guc_rc.c
··· 1 + // SPDX-License-Identifier: MIT 2 + /* 3 + * Copyright © 2026 Intel Corporation 4 + */ 5 + 6 + #include <drm/drm_print.h> 7 + 8 + #include "abi/guc_actions_slpc_abi.h" 9 + #include "xe_device.h" 10 + #include "xe_force_wake.h" 11 + #include "xe_gt.h" 12 + #include "xe_gt_idle.h" 13 + #include "xe_gt_printk.h" 14 + #include "xe_guc.h" 15 + #include "xe_guc_ct.h" 16 + #include "xe_guc_pc.h" 17 + #include "xe_guc_rc.h" 18 + #include "xe_pm.h" 19 + 20 + /** 21 + * DOC: GuC RC (Render C-states) 22 + * 23 + * GuC handles the GT transition to deeper C-states in conjunction with Pcode. 24 + * GuC RC can be enabled independently of the frequency component in SLPC, 25 + * which is also controlled by GuC. 26 + * 27 + * This file will contain all H2G related logic for handling Render C-states. 28 + * There are some calls to xe_gt_idle, where we enable host C6 when GuC RC is 29 + * skipped. GuC RC is mostly independent of xe_guc_pc with the exception of 30 + * functions that override the mode for which we have to rely on the SLPC H2G 31 + * calls. 32 + */ 33 + 34 + static int guc_action_setup_gucrc(struct xe_guc *guc, u32 control) 35 + { 36 + u32 action[] = { 37 + GUC_ACTION_HOST2GUC_SETUP_PC_GUCRC, 38 + control, 39 + }; 40 + int ret; 41 + 42 + ret = xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0); 43 + if (ret && !(xe_device_wedged(guc_to_xe(guc)) && ret == -ECANCELED)) 44 + xe_gt_err(guc_to_gt(guc), 45 + "GuC RC setup %s(%u) failed (%pe)\n", 46 + control == GUCRC_HOST_CONTROL ? "HOST_CONTROL" : 47 + control == GUCRC_FIRMWARE_CONTROL ? "FIRMWARE_CONTROL" : 48 + "UNKNOWN", control, ERR_PTR(ret)); 49 + return ret; 50 + } 51 + 52 + /** 53 + * xe_guc_rc_disable() - Disable GuC RC 54 + * @guc: Xe GuC instance 55 + * 56 + * Disables GuC RC by taking control of RC6 back from GuC. 57 + */ 58 + void xe_guc_rc_disable(struct xe_guc *guc) 59 + { 60 + struct xe_device *xe = guc_to_xe(guc); 61 + struct xe_gt *gt = guc_to_gt(guc); 62 + 63 + if (!xe->info.skip_guc_pc && xe->info.platform != XE_PVC) 64 + if (guc_action_setup_gucrc(guc, GUCRC_HOST_CONTROL)) 65 + return; 66 + 67 + xe_gt_WARN_ON(gt, xe_gt_idle_disable_c6(gt)); 68 + } 69 + 70 + static void xe_guc_rc_fini_hw(void *arg) 71 + { 72 + struct xe_guc *guc = arg; 73 + struct xe_device *xe = guc_to_xe(guc); 74 + struct xe_gt *gt = guc_to_gt(guc); 75 + 76 + if (xe_device_wedged(xe)) 77 + return; 78 + 79 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 80 + xe_guc_rc_disable(guc); 81 + } 82 + 83 + /** 84 + * xe_guc_rc_init() - Init GuC RC 85 + * @guc: Xe GuC instance 86 + * 87 + * Add callback action for GuC RC 88 + * 89 + * Return: 0 on success, negative error code on error. 90 + */ 91 + int xe_guc_rc_init(struct xe_guc *guc) 92 + { 93 + struct xe_device *xe = guc_to_xe(guc); 94 + struct xe_gt *gt = guc_to_gt(guc); 95 + 96 + xe_gt_assert(gt, xe_device_uc_enabled(xe)); 97 + 98 + return devm_add_action_or_reset(xe->drm.dev, xe_guc_rc_fini_hw, guc); 99 + } 100 + 101 + /** 102 + * xe_guc_rc_enable() - Enable GuC RC feature if applicable 103 + * @guc: Xe GuC instance 104 + * 105 + * Enables GuC RC feature. 106 + * 107 + * Return: 0 on success, negative error code on error. 108 + */ 109 + int xe_guc_rc_enable(struct xe_guc *guc) 110 + { 111 + struct xe_device *xe = guc_to_xe(guc); 112 + struct xe_gt *gt = guc_to_gt(guc); 113 + 114 + xe_gt_assert(gt, xe_device_uc_enabled(xe)); 115 + 116 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 117 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FW_GT)) 118 + return -ETIMEDOUT; 119 + 120 + if (xe->info.platform == XE_PVC) { 121 + xe_guc_rc_disable(guc); 122 + return 0; 123 + } 124 + 125 + if (xe->info.skip_guc_pc) { 126 + xe_gt_idle_enable_c6(gt); 127 + return 0; 128 + } 129 + 130 + return guc_action_setup_gucrc(guc, GUCRC_FIRMWARE_CONTROL); 131 + }
+16
drivers/gpu/drm/xe/xe_guc_rc.h
··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2026 Intel Corporation 4 + */ 5 + 6 + #ifndef _XE_GUC_RC_H_ 7 + #define _XE_GUC_RC_H_ 8 + 9 + struct xe_guc; 10 + enum slpc_gucrc_mode; 11 + 12 + int xe_guc_rc_init(struct xe_guc *guc); 13 + int xe_guc_rc_enable(struct xe_guc *guc); 14 + void xe_guc_rc_disable(struct xe_guc *guc); 15 + 16 + #endif
+122 -54
drivers/gpu/drm/xe/xe_guc_submit.c
··· 8 8 #include <linux/bitfield.h> 9 9 #include <linux/bitmap.h> 10 10 #include <linux/circ_buf.h> 11 - #include <linux/delay.h> 12 11 #include <linux/dma-fence-array.h> 13 - #include <linux/math64.h> 14 12 15 13 #include <drm/drm_managed.h> 16 14 ··· 40 42 #include "xe_pm.h" 41 43 #include "xe_ring_ops_types.h" 42 44 #include "xe_sched_job.h" 45 + #include "xe_sleep.h" 43 46 #include "xe_trace.h" 44 47 #include "xe_uc_fw.h" 45 48 #include "xe_vm.h" ··· 555 556 xe_sched_tdr_queue_imm(&q->guc->sched); 556 557 } 557 558 559 + static void xe_guc_exec_queue_group_stop(struct xe_exec_queue *q) 560 + { 561 + struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q); 562 + struct xe_exec_queue_group *group = q->multi_queue.group; 563 + struct xe_exec_queue *eq, *next; 564 + LIST_HEAD(tmp); 565 + 566 + xe_gt_assert(guc_to_gt(exec_queue_to_guc(q)), 567 + xe_exec_queue_is_multi_queue(q)); 568 + 569 + mutex_lock(&group->list_lock); 570 + 571 + /* 572 + * Stop all future queues being from executing while group is stopped. 573 + */ 574 + group->stopped = true; 575 + 576 + list_for_each_entry_safe(eq, next, &group->list, multi_queue.link) 577 + /* 578 + * Refcount prevents an attempted removal from &group->list, 579 + * temporary list allows safe iteration after dropping 580 + * &group->list_lock. 581 + */ 582 + if (xe_exec_queue_get_unless_zero(eq)) 583 + list_move_tail(&eq->multi_queue.link, &tmp); 584 + 585 + mutex_unlock(&group->list_lock); 586 + 587 + /* We cannot stop under list lock without getting inversions */ 588 + xe_sched_submission_stop(&primary->guc->sched); 589 + list_for_each_entry(eq, &tmp, multi_queue.link) 590 + xe_sched_submission_stop(&eq->guc->sched); 591 + 592 + mutex_lock(&group->list_lock); 593 + list_for_each_entry_safe(eq, next, &tmp, multi_queue.link) { 594 + /* 595 + * Corner where we got banned while stopping and not on 596 + * &group->list 597 + */ 598 + if (READ_ONCE(group->banned)) 599 + xe_guc_exec_queue_trigger_cleanup(eq); 600 + 601 + list_move_tail(&eq->multi_queue.link, &group->list); 602 + xe_exec_queue_put(eq); 603 + } 604 + mutex_unlock(&group->list_lock); 605 + } 606 + 607 + static void xe_guc_exec_queue_group_start(struct xe_exec_queue *q) 608 + { 609 + struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q); 610 + struct xe_exec_queue_group *group = q->multi_queue.group; 611 + struct xe_exec_queue *eq; 612 + 613 + xe_gt_assert(guc_to_gt(exec_queue_to_guc(q)), 614 + xe_exec_queue_is_multi_queue(q)); 615 + 616 + xe_sched_submission_start(&primary->guc->sched); 617 + 618 + mutex_lock(&group->list_lock); 619 + group->stopped = false; 620 + list_for_each_entry(eq, &group->list, multi_queue.link) 621 + xe_sched_submission_start(&eq->guc->sched); 622 + mutex_unlock(&group->list_lock); 623 + } 624 + 558 625 static void xe_guc_exec_queue_group_trigger_cleanup(struct xe_exec_queue *q) 559 626 { 560 627 struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q); ··· 803 738 { 804 739 struct xe_exec_queue_group *group = q->multi_queue.group; 805 740 struct xe_device *xe = guc_to_xe(guc); 741 + enum xe_multi_queue_priority priority; 806 742 long ret; 807 743 808 744 /* ··· 827 761 return; 828 762 } 829 763 830 - xe_lrc_set_multi_queue_priority(q->lrc[0], q->multi_queue.priority); 764 + scoped_guard(spinlock, &q->multi_queue.lock) 765 + priority = q->multi_queue.priority; 766 + 767 + xe_lrc_set_multi_queue_priority(q->lrc[0], priority); 831 768 xe_guc_exec_queue_group_cgp_update(xe, q); 832 769 833 770 WRITE_ONCE(group->sync_pending, true); ··· 1031 962 return (WQ_SIZE - q->guc->wqi_tail); 1032 963 } 1033 964 1034 - static inline void relaxed_ms_sleep(unsigned int delay_ms) 1035 - { 1036 - unsigned long min_us, max_us; 1037 - 1038 - if (!delay_ms) 1039 - return; 1040 - 1041 - if (delay_ms > 20) { 1042 - msleep(delay_ms); 1043 - return; 1044 - } 1045 - 1046 - min_us = mul_u32_u32(delay_ms, 1000); 1047 - max_us = min_us + 500; 1048 - 1049 - usleep_range(min_us, max_us); 1050 - } 1051 - 1052 965 static int wq_wait_for_space(struct xe_exec_queue *q, u32 wqi_size) 1053 966 { 1054 967 struct xe_guc *guc = exec_queue_to_guc(q); ··· 1049 998 return -ENODEV; 1050 999 } 1051 1000 1052 - msleep(sleep_period_ms); 1053 - sleep_total_ms += sleep_period_ms; 1054 - if (sleep_period_ms < 64) 1055 - sleep_period_ms <<= 1; 1001 + sleep_total_ms += xe_sleep_exponential_ms(&sleep_period_ms, 64); 1056 1002 goto try_again; 1057 1003 } 1058 1004 } ··· 1462 1414 { 1463 1415 struct xe_sched_job *job = to_xe_sched_job(drm_job); 1464 1416 struct drm_sched_job *tmp_job; 1465 - struct xe_exec_queue *q = job->q; 1417 + struct xe_exec_queue *q = job->q, *primary; 1466 1418 struct xe_gpu_scheduler *sched = &q->guc->sched; 1467 1419 struct xe_guc *guc = exec_queue_to_guc(q); 1468 1420 const char *process_name = "no process"; ··· 1472 1424 bool wedged = false, skip_timeout_check; 1473 1425 1474 1426 xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q)); 1427 + 1428 + primary = xe_exec_queue_multi_queue_primary(q); 1475 1429 1476 1430 /* 1477 1431 * TDR has fired before free job worker. Common if exec queue ··· 1486 1436 return DRM_GPU_SCHED_STAT_NO_HANG; 1487 1437 1488 1438 /* Kill the run_job entry point */ 1489 - xe_sched_submission_stop(sched); 1439 + if (xe_exec_queue_is_multi_queue(q)) 1440 + xe_guc_exec_queue_group_stop(q); 1441 + else 1442 + xe_sched_submission_stop(sched); 1490 1443 1491 1444 /* Must check all state after stopping scheduler */ 1492 1445 skip_timeout_check = exec_queue_reset(q) || ··· 1503 1450 /* LR jobs can only get here if queue has been killed or hit an error */ 1504 1451 if (xe_exec_queue_is_lr(q)) 1505 1452 xe_gt_assert(guc_to_gt(guc), skip_timeout_check); 1506 - 1507 - /* 1508 - * FIXME: In multi-queue scenario, the TDR must ensure that the whole 1509 - * multi-queue group is off the HW before signaling the fences to avoid 1510 - * possible memory corruptions. This means disabling scheduling on the 1511 - * primary queue before or during the secondary queue's TDR. Need to 1512 - * implement this in least obtrusive way. 1513 - */ 1514 1453 1515 1454 /* 1516 1455 * If devcoredump not captured and GuC capture for the job is not ready ··· 1530 1485 set_exec_queue_banned(q); 1531 1486 1532 1487 /* Kick job / queue off hardware */ 1533 - if (!wedged && (exec_queue_enabled(q) || exec_queue_pending_disable(q))) { 1488 + if (!wedged && (exec_queue_enabled(primary) || 1489 + exec_queue_pending_disable(primary))) { 1534 1490 int ret; 1535 1491 1536 - if (exec_queue_reset(q)) 1492 + if (exec_queue_reset(primary)) 1537 1493 err = -EIO; 1538 1494 1539 1495 if (xe_uc_fw_is_running(&guc->fw)) { ··· 1543 1497 * modifying state 1544 1498 */ 1545 1499 ret = wait_event_timeout(guc->ct.wq, 1546 - (!exec_queue_pending_enable(q) && 1547 - !exec_queue_pending_disable(q)) || 1500 + (!exec_queue_pending_enable(primary) && 1501 + !exec_queue_pending_disable(primary)) || 1548 1502 xe_guc_read_stopped(guc) || 1549 1503 vf_recovery(guc), HZ * 5); 1550 1504 if (vf_recovery(guc)) ··· 1552 1506 if (!ret || xe_guc_read_stopped(guc)) 1553 1507 goto trigger_reset; 1554 1508 1555 - disable_scheduling(q, skip_timeout_check); 1509 + disable_scheduling(primary, skip_timeout_check); 1556 1510 } 1557 1511 1558 1512 /* ··· 1566 1520 smp_rmb(); 1567 1521 ret = wait_event_timeout(guc->ct.wq, 1568 1522 !xe_uc_fw_is_running(&guc->fw) || 1569 - !exec_queue_pending_disable(q) || 1523 + !exec_queue_pending_disable(primary) || 1570 1524 xe_guc_read_stopped(guc) || 1571 1525 vf_recovery(guc), HZ * 5); 1572 1526 if (vf_recovery(guc)) ··· 1576 1530 if (!ret) 1577 1531 xe_gt_warn(guc_to_gt(guc), 1578 1532 "Schedule disable failed to respond, guc_id=%d", 1579 - q->guc->id); 1580 - xe_devcoredump(q, job, 1533 + primary->guc->id); 1534 + xe_devcoredump(primary, job, 1581 1535 "Schedule disable failed to respond, guc_id=%d, ret=%d, guc_read=%d", 1582 - q->guc->id, ret, xe_guc_read_stopped(guc)); 1583 - xe_gt_reset_async(q->gt); 1536 + primary->guc->id, ret, xe_guc_read_stopped(guc)); 1537 + xe_gt_reset_async(primary->gt); 1584 1538 xe_sched_tdr_queue_imm(sched); 1585 1539 goto rearm; 1586 1540 } ··· 1626 1580 drm_sched_for_each_pending_job(tmp_job, &sched->base, NULL) 1627 1581 xe_sched_job_set_error(to_xe_sched_job(tmp_job), -ECANCELED); 1628 1582 1629 - xe_sched_submission_start(sched); 1630 - 1631 - if (xe_exec_queue_is_multi_queue(q)) 1583 + if (xe_exec_queue_is_multi_queue(q)) { 1584 + xe_guc_exec_queue_group_start(q); 1632 1585 xe_guc_exec_queue_group_trigger_cleanup(q); 1633 - else 1586 + } else { 1587 + xe_sched_submission_start(sched); 1634 1588 xe_guc_exec_queue_trigger_cleanup(q); 1589 + } 1635 1590 1636 1591 /* 1637 1592 * We want the job added back to the pending list so it gets freed; this ··· 1646 1599 * but there is not currently an easy way to do in DRM scheduler. With 1647 1600 * some thought, do this in a follow up. 1648 1601 */ 1649 - xe_sched_submission_start(sched); 1602 + if (xe_exec_queue_is_multi_queue(q)) 1603 + xe_guc_exec_queue_group_start(q); 1604 + else 1605 + xe_sched_submission_start(sched); 1650 1606 handle_vf_resume: 1651 1607 return DRM_GPU_SCHED_STAT_NO_HANG; 1652 1608 } ··· 1812 1762 since_resume_ms; 1813 1763 1814 1764 if (wait_ms > 0 && q->guc->resume_time) 1815 - relaxed_ms_sleep(wait_ms); 1765 + xe_sleep_relaxed_ms(wait_ms); 1816 1766 1817 1767 set_exec_queue_suspended(q); 1818 1768 disable_scheduling(q, false); ··· 2015 1965 2016 1966 INIT_LIST_HEAD(&q->multi_queue.link); 2017 1967 mutex_lock(&group->list_lock); 1968 + if (group->stopped) 1969 + WRITE_ONCE(q->guc->sched.base.pause_submit, true); 2018 1970 list_add_tail(&q->multi_queue.link, &group->list); 2019 1971 mutex_unlock(&group->list_lock); 2020 1972 } ··· 2163 2111 2164 2112 xe_gt_assert(guc_to_gt(exec_queue_to_guc(q)), xe_exec_queue_is_multi_queue(q)); 2165 2113 2166 - if (q->multi_queue.priority == priority || 2167 - exec_queue_killed_or_banned_or_wedged(q)) 2114 + if (exec_queue_killed_or_banned_or_wedged(q)) 2168 2115 return 0; 2169 2116 2170 2117 msg = kmalloc_obj(*msg); 2171 2118 if (!msg) 2172 2119 return -ENOMEM; 2173 2120 2174 - q->multi_queue.priority = priority; 2121 + scoped_guard(spinlock, &q->multi_queue.lock) { 2122 + if (q->multi_queue.priority == priority) { 2123 + kfree(msg); 2124 + return 0; 2125 + } 2126 + 2127 + q->multi_queue.priority = priority; 2128 + } 2129 + 2175 2130 guc_exec_queue_add_msg(q, msg, SET_MULTI_QUEUE_PRIORITY); 2176 2131 2177 2132 return 0; ··· 2265 2206 return exec_queue_reset(q) || exec_queue_killed_or_banned_or_wedged(q); 2266 2207 } 2267 2208 2209 + static bool guc_exec_queue_active(struct xe_exec_queue *q) 2210 + { 2211 + struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q); 2212 + 2213 + return exec_queue_enabled(primary) && 2214 + !exec_queue_pending_disable(primary); 2215 + } 2216 + 2268 2217 /* 2269 2218 * All of these functions are an abstraction layer which other parts of Xe can 2270 2219 * use to trap into the GuC backend. All of these functions, aside from init, ··· 2292 2225 .suspend_wait = guc_exec_queue_suspend_wait, 2293 2226 .resume = guc_exec_queue_resume, 2294 2227 .reset_status = guc_exec_queue_reset_status, 2228 + .active = guc_exec_queue_active, 2295 2229 }; 2296 2230 2297 2231 static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
+195 -47
drivers/gpu/drm/xe/xe_guc_tlb_inval.c
··· 6 6 #include "abi/guc_actions_abi.h" 7 7 8 8 #include "xe_device.h" 9 + #include "xe_exec_queue.h" 10 + #include "xe_exec_queue_types.h" 9 11 #include "xe_gt_stats.h" 10 12 #include "xe_gt_types.h" 11 13 #include "xe_guc.h" 12 14 #include "xe_guc_ct.h" 15 + #include "xe_guc_exec_queue_types.h" 13 16 #include "xe_guc_tlb_inval.h" 14 17 #include "xe_force_wake.h" 15 18 #include "xe_mmio.h" 16 19 #include "xe_sa.h" 17 20 #include "xe_tlb_inval.h" 21 + #include "xe_vm.h" 18 22 19 23 #include "regs/xe_guc_regs.h" 20 24 ··· 115 111 G2H_LEN_DW_PAGE_RECLAMATION, 1); 116 112 } 117 113 114 + static u64 normalize_invalidation_range(struct xe_gt *gt, u64 *start, u64 *end) 115 + { 116 + u64 orig_start = *start; 117 + u64 length = *end - *start; 118 + u64 align; 119 + 120 + if (length < SZ_4K) 121 + length = SZ_4K; 122 + 123 + align = roundup_pow_of_two(length); 124 + *start = ALIGN_DOWN(*start, align); 125 + *end = ALIGN(*end, align); 126 + length = align; 127 + while (*start + length < *end) { 128 + length <<= 1; 129 + *start = ALIGN_DOWN(orig_start, length); 130 + } 131 + 132 + if (length >= SZ_2M) { 133 + length = max_t(u64, SZ_16M, length); 134 + *start = ALIGN_DOWN(orig_start, length); 135 + } 136 + 137 + xe_gt_assert(gt, length >= SZ_4K); 138 + xe_gt_assert(gt, is_power_of_2(length)); 139 + xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1, 140 + ilog2(SZ_2M) + 1))); 141 + xe_gt_assert(gt, IS_ALIGNED(*start, length)); 142 + 143 + return length; 144 + } 145 + 118 146 /* 119 147 * Ensure that roundup_pow_of_two(length) doesn't overflow. 120 148 * Note that roundup_pow_of_two() operates on unsigned long, ··· 154 118 */ 155 119 #define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX)) 156 120 157 - static int send_tlb_inval_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno, 158 - u64 start, u64 end, u32 asid, 121 + static int send_tlb_inval_ppgtt(struct xe_guc *guc, u32 seqno, u64 start, 122 + u64 end, u32 id, u32 type, 159 123 struct drm_suballoc *prl_sa) 160 124 { 161 125 #define MAX_TLB_INVALIDATION_LEN 7 162 - struct xe_guc *guc = tlb_inval->private; 163 126 struct xe_gt *gt = guc_to_gt(guc); 127 + struct xe_device *xe = guc_to_xe(guc); 164 128 u32 action[MAX_TLB_INVALIDATION_LEN]; 165 129 u64 length = end - start; 166 130 int len = 0, err; 167 131 168 - if (guc_to_xe(guc)->info.force_execlist) 169 - return -ECANCELED; 132 + xe_gt_assert(gt, (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE && 133 + !xe->info.has_ctx_tlb_inval) || 134 + (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX && 135 + xe->info.has_ctx_tlb_inval)); 170 136 171 137 action[len++] = XE_GUC_ACTION_TLB_INVALIDATION; 172 138 action[len++] = !prl_sa ? seqno : TLB_INVALIDATION_SEQNO_INVALID; ··· 176 138 length > MAX_RANGE_TLB_INVALIDATION_LENGTH) { 177 139 action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL); 178 140 } else { 179 - u64 orig_start = start; 180 - u64 align; 181 - 182 - if (length < SZ_4K) 183 - length = SZ_4K; 184 - 185 - /* 186 - * We need to invalidate a higher granularity if start address 187 - * is not aligned to length. When start is not aligned with 188 - * length we need to find the length large enough to create an 189 - * address mask covering the required range. 190 - */ 191 - align = roundup_pow_of_two(length); 192 - start = ALIGN_DOWN(start, align); 193 - end = ALIGN(end, align); 194 - length = align; 195 - while (start + length < end) { 196 - length <<= 1; 197 - start = ALIGN_DOWN(orig_start, length); 198 - } 199 - 200 - /* 201 - * Minimum invalidation size for a 2MB page that the hardware 202 - * expects is 16MB 203 - */ 204 - if (length >= SZ_2M) { 205 - length = max_t(u64, SZ_16M, length); 206 - start = ALIGN_DOWN(orig_start, length); 207 - } 208 - 209 - xe_gt_assert(gt, length >= SZ_4K); 210 - xe_gt_assert(gt, is_power_of_2(length)); 211 - xe_gt_assert(gt, !(length & GENMASK(ilog2(SZ_16M) - 1, 212 - ilog2(SZ_2M) + 1))); 213 - xe_gt_assert(gt, IS_ALIGNED(start, length)); 141 + u64 normalize_len = normalize_invalidation_range(gt, &start, 142 + &end); 143 + bool need_flush = !prl_sa && 144 + seqno != TLB_INVALIDATION_SEQNO_INVALID; 214 145 215 146 /* Flush on NULL case, Media is not required to modify flush due to no PPC so NOP */ 216 - action[len++] = MAKE_INVAL_OP_FLUSH(XE_GUC_TLB_INVAL_PAGE_SELECTIVE, !prl_sa); 217 - action[len++] = asid; 147 + action[len++] = MAKE_INVAL_OP_FLUSH(type, need_flush); 148 + action[len++] = id; 218 149 action[len++] = lower_32_bits(start); 219 150 action[len++] = upper_32_bits(start); 220 - action[len++] = ilog2(length) - ilog2(SZ_4K); 151 + action[len++] = ilog2(normalize_len) - ilog2(SZ_4K); 221 152 } 222 153 223 154 xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN); 155 + #undef MAX_TLB_INVALIDATION_LEN 224 156 225 157 err = send_tlb_inval(guc, action, len); 226 - if (!err && prl_sa) 158 + if (!err && prl_sa) { 159 + xe_gt_assert(gt, seqno != TLB_INVALIDATION_SEQNO_INVALID); 227 160 err = send_page_reclaim(guc, seqno, xe_sa_bo_gpu_addr(prl_sa)); 161 + } 162 + return err; 163 + } 164 + 165 + static int send_tlb_inval_asid_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno, 166 + u64 start, u64 end, u32 asid, 167 + struct drm_suballoc *prl_sa) 168 + { 169 + struct xe_guc *guc = tlb_inval->private; 170 + 171 + lockdep_assert_held(&tlb_inval->seqno_lock); 172 + 173 + if (guc_to_xe(guc)->info.force_execlist) 174 + return -ECANCELED; 175 + 176 + return send_tlb_inval_ppgtt(guc, seqno, start, end, asid, 177 + XE_GUC_TLB_INVAL_PAGE_SELECTIVE, prl_sa); 178 + } 179 + 180 + static int send_tlb_inval_ctx_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno, 181 + u64 start, u64 end, u32 asid, 182 + struct drm_suballoc *prl_sa) 183 + { 184 + struct xe_guc *guc = tlb_inval->private; 185 + struct xe_device *xe = guc_to_xe(guc); 186 + struct xe_exec_queue *q, *next, *last_q = NULL; 187 + struct xe_vm *vm; 188 + LIST_HEAD(tlb_inval_list); 189 + int err = 0, id = guc_to_gt(guc)->info.id; 190 + 191 + lockdep_assert_held(&tlb_inval->seqno_lock); 192 + 193 + if (xe->info.force_execlist) 194 + return -ECANCELED; 195 + 196 + vm = xe_device_asid_to_vm(xe, asid); 197 + if (IS_ERR(vm)) 198 + return PTR_ERR(vm); 199 + 200 + down_read(&vm->exec_queues.lock); 201 + 202 + /* 203 + * XXX: Randomly picking a threshold for now. This will need to be 204 + * tuned based on expected UMD queue counts and performance profiling. 205 + */ 206 + #define EXEC_QUEUE_COUNT_FULL_THRESHOLD 8 207 + if (vm->exec_queues.count[id] >= EXEC_QUEUE_COUNT_FULL_THRESHOLD) { 208 + u32 action[] = { 209 + XE_GUC_ACTION_TLB_INVALIDATION, 210 + seqno, 211 + MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL), 212 + }; 213 + 214 + err = send_tlb_inval(guc, action, ARRAY_SIZE(action)); 215 + goto err_unlock; 216 + } 217 + #undef EXEC_QUEUE_COUNT_FULL_THRESHOLD 218 + 219 + /* 220 + * Move exec queues to a temporary list to issue invalidations. The exec 221 + * queue must active and a reference must be taken to prevent concurrent 222 + * deregistrations. 223 + * 224 + * List modification is safe because we hold 'vm->exec_queues.lock' for 225 + * reading, which prevents external modifications. Using a per-GT list 226 + * is also safe since 'tlb_inval->seqno_lock' ensures no other GT users 227 + * can enter this code path. 228 + */ 229 + list_for_each_entry_safe(q, next, &vm->exec_queues.list[id], 230 + vm_exec_queue_link) { 231 + if (q->ops->active(q) && xe_exec_queue_get_unless_zero(q)) { 232 + last_q = q; 233 + list_move_tail(&q->vm_exec_queue_link, &tlb_inval_list); 234 + } 235 + } 236 + 237 + if (!last_q) { 238 + /* 239 + * We can't break fence ordering for TLB invalidation jobs, if 240 + * TLB invalidations are inflight issue a dummy invalidation to 241 + * maintain ordering. Nor can we move safely the seqno_recv when 242 + * returning -ECANCELED if TLB invalidations are in flight. Use 243 + * GGTT invalidation as dummy invalidation given ASID 244 + * invalidations are unsupported here. 245 + */ 246 + if (xe_tlb_inval_idle(tlb_inval)) 247 + err = -ECANCELED; 248 + else 249 + err = send_tlb_inval_ggtt(tlb_inval, seqno); 250 + goto err_unlock; 251 + } 252 + 253 + list_for_each_entry_safe(q, next, &tlb_inval_list, vm_exec_queue_link) { 254 + struct drm_suballoc *__prl_sa = NULL; 255 + int __seqno = TLB_INVALIDATION_SEQNO_INVALID; 256 + u32 type = XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX; 257 + 258 + xe_assert(xe, q->vm == vm); 259 + 260 + if (err) 261 + goto unref; 262 + 263 + if (last_q == q) { 264 + __prl_sa = prl_sa; 265 + __seqno = seqno; 266 + } 267 + 268 + err = send_tlb_inval_ppgtt(guc, __seqno, start, end, 269 + q->guc->id, type, __prl_sa); 270 + 271 + unref: 272 + /* 273 + * Must always return exec queue to original list / drop 274 + * reference 275 + */ 276 + list_move_tail(&q->vm_exec_queue_link, 277 + &vm->exec_queues.list[id]); 278 + xe_exec_queue_put(q); 279 + } 280 + 281 + err_unlock: 282 + up_read(&vm->exec_queues.lock); 283 + xe_vm_put(vm); 284 + 228 285 return err; 229 286 } 230 287 ··· 350 217 return hw_tlb_timeout + 2 * delay; 351 218 } 352 219 353 - static const struct xe_tlb_inval_ops guc_tlb_inval_ops = { 220 + static const struct xe_tlb_inval_ops guc_tlb_inval_asid_ops = { 354 221 .all = send_tlb_inval_all, 355 222 .ggtt = send_tlb_inval_ggtt, 356 - .ppgtt = send_tlb_inval_ppgtt, 223 + .ppgtt = send_tlb_inval_asid_ppgtt, 224 + .initialized = tlb_inval_initialized, 225 + .flush = tlb_inval_flush, 226 + .timeout_delay = tlb_inval_timeout_delay, 227 + }; 228 + 229 + static const struct xe_tlb_inval_ops guc_tlb_inval_ctx_ops = { 230 + .ggtt = send_tlb_inval_ggtt, 231 + .all = send_tlb_inval_all, 232 + .ppgtt = send_tlb_inval_ctx_ppgtt, 357 233 .initialized = tlb_inval_initialized, 358 234 .flush = tlb_inval_flush, 359 235 .timeout_delay = tlb_inval_timeout_delay, ··· 379 237 void xe_guc_tlb_inval_init_early(struct xe_guc *guc, 380 238 struct xe_tlb_inval *tlb_inval) 381 239 { 240 + struct xe_device *xe = guc_to_xe(guc); 241 + 382 242 tlb_inval->private = guc; 383 - tlb_inval->ops = &guc_tlb_inval_ops; 243 + 244 + if (xe->info.has_ctx_tlb_inval) 245 + tlb_inval->ops = &guc_tlb_inval_ctx_ops; 246 + else 247 + tlb_inval->ops = &guc_tlb_inval_asid_ops; 384 248 } 385 249 386 250 /**
+4 -2
drivers/gpu/drm/xe/xe_hw_engine.c
··· 408 408 }, 409 409 }; 410 410 411 - xe_rtp_process_to_sr(&ctx, lrc_setup, ARRAY_SIZE(lrc_setup), &hwe->reg_lrc); 411 + xe_rtp_process_to_sr(&ctx, lrc_setup, ARRAY_SIZE(lrc_setup), 412 + &hwe->reg_lrc, true); 412 413 } 413 414 414 415 static void ··· 473 472 }, 474 473 }; 475 474 476 - xe_rtp_process_to_sr(&ctx, engine_entries, ARRAY_SIZE(engine_entries), &hwe->reg_sr); 475 + xe_rtp_process_to_sr(&ctx, engine_entries, ARRAY_SIZE(engine_entries), 476 + &hwe->reg_sr, false); 477 477 } 478 478 479 479 static const struct engine_info *find_engine_info(enum xe_engine_class class, int instance)
+2 -1
drivers/gpu/drm/xe/xe_hw_engine_group.c
··· 51 51 if (!group) 52 52 return ERR_PTR(-ENOMEM); 53 53 54 - group->resume_wq = alloc_workqueue("xe-resume-lr-jobs-wq", 0, 0); 54 + group->resume_wq = alloc_workqueue("xe-resume-lr-jobs-wq", WQ_PERCPU, 55 + 0); 55 56 if (!group->resume_wq) 56 57 return ERR_PTR(-ENOMEM); 57 58
+1 -1
drivers/gpu/drm/xe/xe_i2c.c
··· 27 27 #include "regs/xe_i2c_regs.h" 28 28 #include "regs/xe_irq_regs.h" 29 29 30 - #include "xe_device_types.h" 30 + #include "xe_device.h" 31 31 #include "xe_i2c.h" 32 32 #include "xe_mmio.h" 33 33 #include "xe_sriov.h"
+17
drivers/gpu/drm/xe/xe_lmtt.c
··· 57 57 return BIT_ULL(lmtt->ops->lmtt_pte_shift(0)); 58 58 } 59 59 60 + /** 61 + * xe_lmtt_page_size() - Get LMTT page size. 62 + * @lmtt: the &xe_lmtt 63 + * 64 + * This function shall be called only by PF. 65 + * 66 + * Return: LMTT page size. 67 + */ 68 + u64 xe_lmtt_page_size(struct xe_lmtt *lmtt) 69 + { 70 + lmtt_assert(lmtt, IS_SRIOV_PF(lmtt_to_xe(lmtt))); 71 + lmtt_assert(lmtt, xe_device_has_lmtt(lmtt_to_xe(lmtt))); 72 + lmtt_assert(lmtt, lmtt->ops); 73 + 74 + return lmtt_page_size(lmtt); 75 + } 76 + 60 77 static struct xe_lmtt_pt *lmtt_pt_alloc(struct xe_lmtt *lmtt, unsigned int level) 61 78 { 62 79 unsigned int num_entries = level ? lmtt->ops->lmtt_pte_num(level) : 0;
+1
drivers/gpu/drm/xe/xe_lmtt.h
··· 20 20 int xe_lmtt_populate_pages(struct xe_lmtt *lmtt, unsigned int vfid, struct xe_bo *bo, u64 offset); 21 21 void xe_lmtt_drop_pages(struct xe_lmtt *lmtt, unsigned int vfid); 22 22 u64 xe_lmtt_estimate_pt_size(struct xe_lmtt *lmtt, u64 size); 23 + u64 xe_lmtt_page_size(struct xe_lmtt *lmtt); 23 24 #else 24 25 static inline int xe_lmtt_init(struct xe_lmtt *lmtt) { return 0; } 25 26 static inline void xe_lmtt_init_hw(struct xe_lmtt *lmtt) { }
+215 -74
drivers/gpu/drm/xe/xe_lrc.c
··· 113 113 /* Engine context image */ 114 114 switch (class) { 115 115 case XE_ENGINE_CLASS_RENDER: 116 - if (GRAPHICS_VER(xe) >= 20) 116 + if (GRAPHICS_VERx100(xe) >= 3510) 117 + size += 7 * SZ_4K; 118 + else if (GRAPHICS_VER(xe) >= 20) 117 119 size += 3 * SZ_4K; 118 120 else 119 121 size += 13 * SZ_4K; 120 122 break; 121 123 case XE_ENGINE_CLASS_COMPUTE: 122 - if (GRAPHICS_VER(xe) >= 20) 124 + if (GRAPHICS_VERx100(xe) >= 3510) 125 + size += 5 * SZ_4K; 126 + else if (GRAPHICS_VER(xe) >= 20) 123 127 size += 2 * SZ_4K; 124 128 else 125 129 size += 13 * SZ_4K; ··· 715 711 #define __xe_lrc_pphwsp_offset xe_lrc_pphwsp_offset 716 712 #define __xe_lrc_regs_offset xe_lrc_regs_offset 717 713 718 - #define LRC_SEQNO_PPHWSP_OFFSET 512 719 - #define LRC_START_SEQNO_PPHWSP_OFFSET (LRC_SEQNO_PPHWSP_OFFSET + 8) 720 - #define LRC_CTX_JOB_TIMESTAMP_OFFSET (LRC_START_SEQNO_PPHWSP_OFFSET + 8) 714 + #define LRC_CTX_JOB_TIMESTAMP_OFFSET 512 721 715 #define LRC_ENGINE_ID_PPHWSP_OFFSET 1024 722 716 #define LRC_PARALLEL_PPHWSP_OFFSET 2048 717 + 718 + #define LRC_SEQNO_OFFSET 0 719 + #define LRC_START_SEQNO_OFFSET (LRC_SEQNO_OFFSET + 8) 723 720 724 721 u32 xe_lrc_regs_offset(struct xe_lrc *lrc) 725 722 { ··· 748 743 749 744 static inline u32 __xe_lrc_seqno_offset(struct xe_lrc *lrc) 750 745 { 751 - /* The seqno is stored in the driver-defined portion of PPHWSP */ 752 - return xe_lrc_pphwsp_offset(lrc) + LRC_SEQNO_PPHWSP_OFFSET; 746 + return LRC_SEQNO_OFFSET; 753 747 } 754 748 755 749 static inline u32 __xe_lrc_start_seqno_offset(struct xe_lrc *lrc) 756 750 { 757 - /* The start seqno is stored in the driver-defined portion of PPHWSP */ 758 - return xe_lrc_pphwsp_offset(lrc) + LRC_START_SEQNO_PPHWSP_OFFSET; 751 + return LRC_START_SEQNO_OFFSET; 759 752 } 760 753 761 754 static u32 __xe_lrc_ctx_job_timestamp_offset(struct xe_lrc *lrc) ··· 804 801 return xe_bo_size(lrc->bo) - LRC_WA_BB_SIZE; 805 802 } 806 803 807 - #define DECL_MAP_ADDR_HELPERS(elem) \ 804 + #define DECL_MAP_ADDR_HELPERS(elem, bo_expr) \ 808 805 static inline struct iosys_map __xe_lrc_##elem##_map(struct xe_lrc *lrc) \ 809 806 { \ 810 - struct iosys_map map = lrc->bo->vmap; \ 807 + struct xe_bo *bo = (bo_expr); \ 808 + struct iosys_map map = bo->vmap; \ 811 809 \ 812 810 xe_assert(lrc_to_xe(lrc), !iosys_map_is_null(&map)); \ 813 811 iosys_map_incr(&map, __xe_lrc_##elem##_offset(lrc)); \ ··· 816 812 } \ 817 813 static inline u32 __maybe_unused __xe_lrc_##elem##_ggtt_addr(struct xe_lrc *lrc) \ 818 814 { \ 819 - return xe_bo_ggtt_addr(lrc->bo) + __xe_lrc_##elem##_offset(lrc); \ 815 + struct xe_bo *bo = (bo_expr); \ 816 + \ 817 + return xe_bo_ggtt_addr(bo) + __xe_lrc_##elem##_offset(lrc); \ 820 818 } \ 821 819 822 - DECL_MAP_ADDR_HELPERS(ring) 823 - DECL_MAP_ADDR_HELPERS(pphwsp) 824 - DECL_MAP_ADDR_HELPERS(seqno) 825 - DECL_MAP_ADDR_HELPERS(regs) 826 - DECL_MAP_ADDR_HELPERS(start_seqno) 827 - DECL_MAP_ADDR_HELPERS(ctx_job_timestamp) 828 - DECL_MAP_ADDR_HELPERS(ctx_timestamp) 829 - DECL_MAP_ADDR_HELPERS(ctx_timestamp_udw) 830 - DECL_MAP_ADDR_HELPERS(parallel) 831 - DECL_MAP_ADDR_HELPERS(indirect_ring) 832 - DECL_MAP_ADDR_HELPERS(engine_id) 820 + DECL_MAP_ADDR_HELPERS(ring, lrc->bo) 821 + DECL_MAP_ADDR_HELPERS(pphwsp, lrc->bo) 822 + DECL_MAP_ADDR_HELPERS(seqno, lrc->seqno_bo) 823 + DECL_MAP_ADDR_HELPERS(regs, lrc->bo) 824 + DECL_MAP_ADDR_HELPERS(start_seqno, lrc->seqno_bo) 825 + DECL_MAP_ADDR_HELPERS(ctx_job_timestamp, lrc->bo) 826 + DECL_MAP_ADDR_HELPERS(ctx_timestamp, lrc->bo) 827 + DECL_MAP_ADDR_HELPERS(ctx_timestamp_udw, lrc->bo) 828 + DECL_MAP_ADDR_HELPERS(parallel, lrc->bo) 829 + DECL_MAP_ADDR_HELPERS(indirect_ring, lrc->bo) 830 + DECL_MAP_ADDR_HELPERS(engine_id, lrc->bo) 833 831 834 832 #undef DECL_MAP_ADDR_HELPERS 835 833 ··· 1038 1032 { 1039 1033 xe_hw_fence_ctx_finish(&lrc->fence_ctx); 1040 1034 xe_bo_unpin_map_no_vm(lrc->bo); 1035 + xe_bo_unpin_map_no_vm(lrc->seqno_bo); 1041 1036 } 1042 1037 1043 1038 /* ··· 1438 1431 lrc->desc |= FIELD_PREP(LRC_PRIORITY, xe_multi_queue_prio_to_lrc(lrc, priority)); 1439 1432 } 1440 1433 1441 - static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, 1442 - struct xe_vm *vm, void *replay_state, u32 ring_size, 1443 - u16 msix_vec, 1444 - u32 init_flags) 1434 + static int xe_lrc_ctx_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, struct xe_vm *vm, 1435 + void *replay_state, u16 msix_vec, u32 init_flags) 1445 1436 { 1446 1437 struct xe_gt *gt = hwe->gt; 1447 - const u32 lrc_size = xe_gt_lrc_size(gt, hwe->class); 1448 - u32 bo_size = ring_size + lrc_size + LRC_WA_BB_SIZE; 1449 1438 struct xe_tile *tile = gt_to_tile(gt); 1450 1439 struct xe_device *xe = gt_to_xe(gt); 1451 1440 struct iosys_map map; 1452 1441 u32 arb_enable; 1453 - u32 bo_flags; 1454 1442 int err; 1455 - 1456 - kref_init(&lrc->refcount); 1457 - lrc->gt = gt; 1458 - lrc->replay_size = xe_gt_lrc_hang_replay_size(gt, hwe->class); 1459 - lrc->size = lrc_size; 1460 - lrc->flags = 0; 1461 - lrc->ring.size = ring_size; 1462 - lrc->ring.tail = 0; 1463 - 1464 - if (gt_engine_needs_indirect_ctx(gt, hwe->class)) { 1465 - lrc->flags |= XE_LRC_FLAG_INDIRECT_CTX; 1466 - bo_size += LRC_INDIRECT_CTX_BO_SIZE; 1467 - } 1468 - 1469 - if (xe_gt_has_indirect_ring_state(gt)) 1470 - lrc->flags |= XE_LRC_FLAG_INDIRECT_RING_STATE; 1471 - 1472 - bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile) | XE_BO_FLAG_GGTT | 1473 - XE_BO_FLAG_GGTT_INVALIDATE; 1474 - 1475 - if ((vm && vm->xef) || init_flags & XE_LRC_CREATE_USER_CTX) /* userspace */ 1476 - bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE | XE_BO_FLAG_FORCE_USER_VRAM; 1477 - 1478 - lrc->bo = xe_bo_create_pin_map_novm(xe, tile, 1479 - bo_size, 1480 - ttm_bo_type_kernel, 1481 - bo_flags, false); 1482 - if (IS_ERR(lrc->bo)) 1483 - return PTR_ERR(lrc->bo); 1484 - 1485 - xe_hw_fence_ctx_init(&lrc->fence_ctx, hwe->gt, 1486 - hwe->fence_irq, hwe->name); 1487 1443 1488 1444 /* 1489 1445 * Init Per-Process of HW status Page, LRC / context state to known ··· 1459 1489 xe_map_memset(xe, &map, 0, 0, LRC_PPHWSP_SIZE); /* PPHWSP */ 1460 1490 xe_map_memcpy_to(xe, &map, LRC_PPHWSP_SIZE, 1461 1491 gt->default_lrc[hwe->class] + LRC_PPHWSP_SIZE, 1462 - lrc_size - LRC_PPHWSP_SIZE); 1492 + lrc->size - LRC_PPHWSP_SIZE); 1463 1493 if (replay_state) 1464 1494 xe_map_memcpy_to(xe, &map, LRC_PPHWSP_SIZE, 1465 1495 replay_state, lrc->replay_size); ··· 1467 1497 void *init_data = empty_lrc_data(hwe); 1468 1498 1469 1499 if (!init_data) { 1470 - err = -ENOMEM; 1471 - goto err_lrc_finish; 1500 + return -ENOMEM; 1472 1501 } 1473 1502 1474 - xe_map_memcpy_to(xe, &map, 0, init_data, lrc_size); 1503 + xe_map_memcpy_to(xe, &map, 0, init_data, lrc->size); 1475 1504 kfree(init_data); 1476 1505 } 1477 1506 1478 - if (vm) { 1507 + if (vm) 1479 1508 xe_lrc_set_ppgtt(lrc, vm); 1480 - 1481 - if (vm->xef) 1482 - xe_drm_client_add_bo(vm->xef->client, lrc->bo); 1483 - } 1484 1509 1485 1510 if (xe_device_has_msix(xe)) { 1486 1511 xe_lrc_write_ctx_reg(lrc, CTX_INT_STATUS_REPORT_PTR, ··· 1492 1527 xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_START, 1493 1528 __xe_lrc_ring_ggtt_addr(lrc)); 1494 1529 xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_START_UDW, 0); 1495 - xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_HEAD, 0); 1530 + 1531 + /* Match head and tail pointers */ 1532 + xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_HEAD, lrc->ring.tail); 1496 1533 xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_TAIL, lrc->ring.tail); 1534 + 1497 1535 xe_lrc_write_indirect_ctx_reg(lrc, INDIRECT_CTX_RING_CTL, 1498 1536 RING_CTL_SIZE(lrc->ring.size) | RING_VALID); 1499 1537 } else { 1500 1538 xe_lrc_write_ctx_reg(lrc, CTX_RING_START, __xe_lrc_ring_ggtt_addr(lrc)); 1501 - xe_lrc_write_ctx_reg(lrc, CTX_RING_HEAD, 0); 1539 + 1540 + /* Match head and tail pointers */ 1541 + xe_lrc_write_ctx_reg(lrc, CTX_RING_HEAD, lrc->ring.tail); 1502 1542 xe_lrc_write_ctx_reg(lrc, CTX_RING_TAIL, lrc->ring.tail); 1543 + 1503 1544 xe_lrc_write_ctx_reg(lrc, CTX_RING_CTL, 1504 1545 RING_CTL_SIZE(lrc->ring.size) | RING_VALID); 1505 1546 } ··· 1554 1583 1555 1584 err = setup_wa_bb(lrc, hwe); 1556 1585 if (err) 1557 - goto err_lrc_finish; 1586 + return err; 1558 1587 1559 1588 err = setup_indirect_ctx(lrc, hwe); 1589 + 1590 + return err; 1591 + } 1592 + 1593 + static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, struct xe_vm *vm, 1594 + void *replay_state, u32 ring_size, u16 msix_vec, u32 init_flags) 1595 + { 1596 + struct xe_gt *gt = hwe->gt; 1597 + const u32 lrc_size = xe_gt_lrc_size(gt, hwe->class); 1598 + u32 bo_size = ring_size + lrc_size + LRC_WA_BB_SIZE; 1599 + struct xe_tile *tile = gt_to_tile(gt); 1600 + struct xe_device *xe = gt_to_xe(gt); 1601 + struct xe_bo *bo; 1602 + u32 bo_flags; 1603 + int err; 1604 + 1605 + kref_init(&lrc->refcount); 1606 + lrc->gt = gt; 1607 + lrc->replay_size = xe_gt_lrc_hang_replay_size(gt, hwe->class); 1608 + lrc->size = lrc_size; 1609 + lrc->flags = 0; 1610 + lrc->ring.size = ring_size; 1611 + lrc->ring.tail = 0; 1612 + 1613 + if (gt_engine_needs_indirect_ctx(gt, hwe->class)) { 1614 + lrc->flags |= XE_LRC_FLAG_INDIRECT_CTX; 1615 + bo_size += LRC_INDIRECT_CTX_BO_SIZE; 1616 + } 1617 + 1618 + if (xe_gt_has_indirect_ring_state(gt)) 1619 + lrc->flags |= XE_LRC_FLAG_INDIRECT_RING_STATE; 1620 + 1621 + bo_flags = XE_BO_FLAG_VRAM_IF_DGFX(tile) | XE_BO_FLAG_GGTT | 1622 + XE_BO_FLAG_GGTT_INVALIDATE; 1623 + 1624 + if ((vm && vm->xef) || init_flags & XE_LRC_CREATE_USER_CTX) /* userspace */ 1625 + bo_flags |= XE_BO_FLAG_PINNED_LATE_RESTORE | XE_BO_FLAG_FORCE_USER_VRAM; 1626 + 1627 + bo = xe_bo_create_pin_map_novm(xe, tile, bo_size, 1628 + ttm_bo_type_kernel, 1629 + bo_flags, false); 1630 + if (IS_ERR(lrc->bo)) 1631 + return PTR_ERR(lrc->bo); 1632 + 1633 + lrc->bo = bo; 1634 + 1635 + bo = xe_bo_create_pin_map_novm(xe, tile, PAGE_SIZE, 1636 + ttm_bo_type_kernel, 1637 + XE_BO_FLAG_GGTT | 1638 + XE_BO_FLAG_GGTT_INVALIDATE | 1639 + XE_BO_FLAG_SYSTEM, false); 1640 + if (IS_ERR(bo)) { 1641 + err = PTR_ERR(bo); 1642 + goto err_lrc_finish; 1643 + } 1644 + lrc->seqno_bo = bo; 1645 + 1646 + xe_hw_fence_ctx_init(&lrc->fence_ctx, hwe->gt, 1647 + hwe->fence_irq, hwe->name); 1648 + 1649 + err = xe_lrc_ctx_init(lrc, hwe, vm, replay_state, msix_vec, init_flags); 1560 1650 if (err) 1561 1651 goto err_lrc_finish; 1652 + 1653 + if (vm && vm->xef) 1654 + xe_drm_client_add_bo(vm->xef->client, lrc->bo); 1562 1655 1563 1656 return 0; 1564 1657 ··· 2001 1966 MATCH(PIPELINE_SELECT); 2002 1967 2003 1968 MATCH3D(3DSTATE_DRAWING_RECTANGLE_FAST); 1969 + MATCH3D(3DSTATE_CUSTOM_SAMPLE_PATTERN); 2004 1970 MATCH3D(3DSTATE_CLEAR_PARAMS); 2005 1971 MATCH3D(3DSTATE_DEPTH_BUFFER); 2006 1972 MATCH3D(3DSTATE_STENCIL_BUFFER); ··· 2085 2049 MATCH3D(3DSTATE_SBE_MESH); 2086 2050 MATCH3D(3DSTATE_CPSIZE_CONTROL_BUFFER); 2087 2051 MATCH3D(3DSTATE_COARSE_PIXEL); 2052 + MATCH3D(3DSTATE_MESH_SHADER_DATA_EXT); 2053 + MATCH3D(3DSTATE_TASK_SHADER_DATA_EXT); 2054 + MATCH3D(3DSTATE_VIEWPORT_STATE_POINTERS_CC_2); 2055 + MATCH3D(3DSTATE_CC_STATE_POINTERS_2); 2056 + MATCH3D(3DSTATE_SCISSOR_STATE_POINTERS_2); 2057 + MATCH3D(3DSTATE_BLEND_STATE_POINTERS_2); 2058 + MATCH3D(3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP_2); 2088 2059 2089 2060 MATCH3D(3DSTATE_DRAWING_RECTANGLE); 2061 + MATCH3D(3DSTATE_URB_MEMORY); 2090 2062 MATCH3D(3DSTATE_CHROMA_KEY); 2091 2063 MATCH3D(3DSTATE_POLY_STIPPLE_OFFSET); 2092 2064 MATCH3D(3DSTATE_POLY_STIPPLE_PATTERN); ··· 2114 2070 MATCH3D(3DSTATE_SUBSLICE_HASH_TABLE); 2115 2071 MATCH3D(3DSTATE_SLICE_TABLE_STATE_POINTERS); 2116 2072 MATCH3D(3DSTATE_PTBR_TILE_PASS_INFO); 2073 + MATCH3D(3DSTATE_SLICE_TABLE_STATE_POINTER_2); 2117 2074 2118 2075 default: 2119 2076 drm_printf(p, "[%#010x] unknown GFXPIPE command (pipeline=%#x, opcode=%#x, subopcode=%#x), likely %d dwords\n", ··· 2184 2139 dw += num_dw; 2185 2140 remaining_dw -= num_dw; 2186 2141 } 2142 + } 2143 + 2144 + /* 2145 + * Lookup the value of a register within the offset/value pairs of an 2146 + * MI_LOAD_REGISTER_IMM instruction. 2147 + * 2148 + * Return -ENOENT if the register is not present in the MI_LRI instruction. 2149 + */ 2150 + static int lookup_reg_in_mi_lri(u32 offset, u32 *value, 2151 + const u32 *dword_pair, int num_regs) 2152 + { 2153 + for (int i = 0; i < num_regs; i++) { 2154 + if (dword_pair[2 * i] == offset) { 2155 + *value = dword_pair[2 * i + 1]; 2156 + return 0; 2157 + } 2158 + } 2159 + 2160 + return -ENOENT; 2161 + } 2162 + 2163 + /* 2164 + * Lookup the value of a register in a specific engine type's default LRC. 2165 + * 2166 + * Return -EINVAL if the default LRC doesn't exist, or ENOENT if the register 2167 + * cannot be found in the default LRC. 2168 + */ 2169 + int xe_lrc_lookup_default_reg_value(struct xe_gt *gt, 2170 + enum xe_engine_class hwe_class, 2171 + u32 offset, 2172 + u32 *value) 2173 + { 2174 + u32 *dw; 2175 + int remaining_dw, ret; 2176 + 2177 + if (!gt->default_lrc[hwe_class]) 2178 + return -EINVAL; 2179 + 2180 + /* 2181 + * Skip the beginning of the LRC since it contains the per-process 2182 + * hardware status page. 2183 + */ 2184 + dw = gt->default_lrc[hwe_class] + LRC_PPHWSP_SIZE; 2185 + remaining_dw = (xe_gt_lrc_size(gt, hwe_class) - LRC_PPHWSP_SIZE) / 4; 2186 + 2187 + while (remaining_dw > 0) { 2188 + u32 num_dw = instr_dw(*dw); 2189 + 2190 + if (num_dw > remaining_dw) 2191 + num_dw = remaining_dw; 2192 + 2193 + switch (*dw & XE_INSTR_CMD_TYPE) { 2194 + case XE_INSTR_MI: 2195 + switch (*dw & MI_OPCODE) { 2196 + case MI_BATCH_BUFFER_END: 2197 + /* End of LRC; register not found */ 2198 + return -ENOENT; 2199 + 2200 + case MI_NOOP: 2201 + case MI_TOPOLOGY_FILTER: 2202 + /* 2203 + * MI_NOOP and MI_TOPOLOGY_FILTER don't have 2204 + * a length field and are always 1-dword 2205 + * instructions. 2206 + */ 2207 + remaining_dw--; 2208 + dw++; 2209 + break; 2210 + 2211 + case MI_LOAD_REGISTER_IMM: 2212 + ret = lookup_reg_in_mi_lri(offset, value, 2213 + dw + 1, (num_dw - 1) / 2); 2214 + if (ret == 0) 2215 + return 0; 2216 + 2217 + fallthrough; 2218 + 2219 + default: 2220 + /* 2221 + * Jump to next instruction based on length 2222 + * field. 2223 + */ 2224 + remaining_dw -= num_dw; 2225 + dw += num_dw; 2226 + break; 2227 + } 2228 + break; 2229 + 2230 + default: 2231 + /* Jump to next instruction based on length field. */ 2232 + remaining_dw -= num_dw; 2233 + dw += num_dw; 2234 + } 2235 + } 2236 + 2237 + return -ENOENT; 2187 2238 } 2188 2239 2189 2240 struct instr_state {
+6 -1
drivers/gpu/drm/xe/xe_lrc.h
··· 75 75 */ 76 76 static inline void xe_lrc_put(struct xe_lrc *lrc) 77 77 { 78 - kref_put(&lrc->refcount, xe_lrc_destroy); 78 + if (lrc) 79 + kref_put(&lrc->refcount, xe_lrc_destroy); 79 80 } 80 81 81 82 /** ··· 134 133 void xe_lrc_dump_default(struct drm_printer *p, 135 134 struct xe_gt *gt, 136 135 enum xe_engine_class); 136 + int xe_lrc_lookup_default_reg_value(struct xe_gt *gt, 137 + enum xe_engine_class hwe_class, 138 + u32 offset, 139 + u32 *value); 137 140 138 141 u32 *xe_lrc_emit_hwe_state_instructions(struct xe_exec_queue *q, u32 *cs); 139 142
+6
drivers/gpu/drm/xe/xe_lrc_types.h
··· 22 22 */ 23 23 struct xe_bo *bo; 24 24 25 + /** 26 + * @seqno_bo: Buffer object (memory) for seqno numbers. Always in system 27 + * memory as this a CPU read, GPU write path object. 28 + */ 29 + struct xe_bo *seqno_bo; 30 + 25 31 /** @size: size of the lrc and optional indirect ring state */ 26 32 u32 size; 27 33
+64 -55
drivers/gpu/drm/xe/xe_migrate.c
··· 25 25 #include "xe_exec_queue.h" 26 26 #include "xe_ggtt.h" 27 27 #include "xe_gt.h" 28 + #include "xe_gt_printk.h" 28 29 #include "xe_hw_engine.h" 29 30 #include "xe_lrc.h" 30 31 #include "xe_map.h" ··· 1149 1148 size -= src_L0; 1150 1149 } 1151 1150 1151 + bb = xe_bb_alloc(gt); 1152 + if (IS_ERR(bb)) 1153 + return PTR_ERR(bb); 1154 + 1152 1155 bb_pool = ctx->mem.ccs_bb_pool; 1153 - guard(mutex) (xe_sa_bo_swap_guard(bb_pool)); 1154 - xe_sa_bo_swap_shadow(bb_pool); 1156 + scoped_guard(mutex, xe_sa_bo_swap_guard(bb_pool)) { 1157 + xe_sa_bo_swap_shadow(bb_pool); 1155 1158 1156 - bb = xe_bb_ccs_new(gt, batch_size, read_write); 1157 - if (IS_ERR(bb)) { 1158 - drm_err(&xe->drm, "BB allocation failed.\n"); 1159 - err = PTR_ERR(bb); 1160 - return err; 1159 + err = xe_bb_init(bb, bb_pool, batch_size); 1160 + if (err) { 1161 + xe_gt_err(gt, "BB allocation failed.\n"); 1162 + xe_bb_free(bb, NULL); 1163 + return err; 1164 + } 1165 + 1166 + batch_size_allocated = batch_size; 1167 + size = xe_bo_size(src_bo); 1168 + batch_size = 0; 1169 + 1170 + /* 1171 + * Emit PTE and copy commands here. 1172 + * The CCS copy command can only support limited size. If the size to be 1173 + * copied is more than the limit, divide copy into chunks. So, calculate 1174 + * sizes here again before copy command is emitted. 1175 + */ 1176 + 1177 + while (size) { 1178 + batch_size += 10; /* Flush + ggtt addr + 2 NOP */ 1179 + u32 flush_flags = 0; 1180 + u64 ccs_ofs, ccs_size; 1181 + u32 ccs_pt; 1182 + 1183 + u32 avail_pts = max_mem_transfer_per_pass(xe) / 1184 + LEVEL0_PAGE_TABLE_ENCODE_SIZE; 1185 + 1186 + src_L0 = xe_migrate_res_sizes(m, &src_it); 1187 + 1188 + batch_size += pte_update_size(m, false, src, &src_it, &src_L0, 1189 + &src_L0_ofs, &src_L0_pt, 0, 0, 1190 + avail_pts); 1191 + 1192 + ccs_size = xe_device_ccs_bytes(xe, src_L0); 1193 + batch_size += pte_update_size(m, 0, NULL, &ccs_it, &ccs_size, &ccs_ofs, 1194 + &ccs_pt, 0, avail_pts, avail_pts); 1195 + xe_assert(xe, IS_ALIGNED(ccs_it.start, PAGE_SIZE)); 1196 + batch_size += EMIT_COPY_CCS_DW; 1197 + 1198 + emit_pte(m, bb, src_L0_pt, false, true, &src_it, src_L0, src); 1199 + 1200 + emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src); 1201 + 1202 + bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags); 1203 + flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt, 1204 + src_L0_ofs, dst_is_pltt, 1205 + src_L0, ccs_ofs, true); 1206 + bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags); 1207 + 1208 + size -= src_L0; 1209 + } 1210 + 1211 + xe_assert(xe, (batch_size_allocated == bb->len)); 1212 + src_bo->bb_ccs[read_write] = bb; 1213 + 1214 + xe_sriov_vf_ccs_rw_update_bb_addr(ctx); 1215 + xe_sa_bo_sync_shadow(bb->bo); 1161 1216 } 1162 1217 1163 - batch_size_allocated = batch_size; 1164 - size = xe_bo_size(src_bo); 1165 - batch_size = 0; 1166 - 1167 - /* 1168 - * Emit PTE and copy commands here. 1169 - * The CCS copy command can only support limited size. If the size to be 1170 - * copied is more than the limit, divide copy into chunks. So, calculate 1171 - * sizes here again before copy command is emitted. 1172 - */ 1173 - while (size) { 1174 - batch_size += 10; /* Flush + ggtt addr + 2 NOP */ 1175 - u32 flush_flags = 0; 1176 - u64 ccs_ofs, ccs_size; 1177 - u32 ccs_pt; 1178 - 1179 - u32 avail_pts = max_mem_transfer_per_pass(xe) / LEVEL0_PAGE_TABLE_ENCODE_SIZE; 1180 - 1181 - src_L0 = xe_migrate_res_sizes(m, &src_it); 1182 - 1183 - batch_size += pte_update_size(m, false, src, &src_it, &src_L0, 1184 - &src_L0_ofs, &src_L0_pt, 0, 0, 1185 - avail_pts); 1186 - 1187 - ccs_size = xe_device_ccs_bytes(xe, src_L0); 1188 - batch_size += pte_update_size(m, 0, NULL, &ccs_it, &ccs_size, &ccs_ofs, 1189 - &ccs_pt, 0, avail_pts, avail_pts); 1190 - xe_assert(xe, IS_ALIGNED(ccs_it.start, PAGE_SIZE)); 1191 - batch_size += EMIT_COPY_CCS_DW; 1192 - 1193 - emit_pte(m, bb, src_L0_pt, false, true, &src_it, src_L0, src); 1194 - 1195 - emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src); 1196 - 1197 - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags); 1198 - flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt, 1199 - src_L0_ofs, dst_is_pltt, 1200 - src_L0, ccs_ofs, true); 1201 - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags); 1202 - 1203 - size -= src_L0; 1204 - } 1205 - 1206 - xe_assert(xe, (batch_size_allocated == bb->len)); 1207 - src_bo->bb_ccs[read_write] = bb; 1208 - 1209 - xe_sriov_vf_ccs_rw_update_bb_addr(ctx); 1210 - xe_sa_bo_sync_shadow(bb->bo); 1211 1218 return 0; 1212 1219 } 1213 1220
+1 -6
drivers/gpu/drm/xe/xe_mmio.h
··· 6 6 #ifndef _XE_MMIO_H_ 7 7 #define _XE_MMIO_H_ 8 8 9 - #include "xe_gt_types.h" 9 + #include "xe_mmio_types.h" 10 10 11 11 struct xe_device; 12 12 struct xe_reg; ··· 35 35 if (addr < mmio->adj_limit) 36 36 addr += mmio->adj_offset; 37 37 return addr; 38 - } 39 - 40 - static inline struct xe_mmio *xe_root_tile_mmio(struct xe_device *xe) 41 - { 42 - return &xe->tiles[0].mmio; 43 38 } 44 39 45 40 #ifdef CONFIG_PCI_IOV
+64
drivers/gpu/drm/xe/xe_mmio_types.h
··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2022-2026 Intel Corporation 4 + */ 5 + 6 + #ifndef _XE_MMIO_TYPES_H_ 7 + #define _XE_MMIO_TYPES_H_ 8 + 9 + #include <linux/types.h> 10 + 11 + struct xe_gt; 12 + struct xe_tile; 13 + 14 + /** 15 + * struct xe_mmio - register mmio structure 16 + * 17 + * Represents an MMIO region that the CPU may use to access registers. A 18 + * region may share its IO map with other regions (e.g., all GTs within a 19 + * tile share the same map with their parent tile, but represent different 20 + * subregions of the overall IO space). 21 + */ 22 + struct xe_mmio { 23 + /** @tile: Backpointer to tile, used for tracing */ 24 + struct xe_tile *tile; 25 + 26 + /** @regs: Map used to access registers. */ 27 + void __iomem *regs; 28 + 29 + /** 30 + * @sriov_vf_gt: Backpointer to GT. 31 + * 32 + * This pointer is only set for GT MMIO regions and only when running 33 + * as an SRIOV VF structure 34 + */ 35 + struct xe_gt *sriov_vf_gt; 36 + 37 + /** 38 + * @regs_size: Length of the register region within the map. 39 + * 40 + * The size of the iomap set in *regs is generally larger than the 41 + * register mmio space since it includes unused regions and/or 42 + * non-register regions such as the GGTT PTEs. 43 + */ 44 + size_t regs_size; 45 + 46 + /** @adj_limit: adjust MMIO address if address is below this value */ 47 + u32 adj_limit; 48 + 49 + /** @adj_offset: offset to add to MMIO address when adjusting */ 50 + u32 adj_offset; 51 + }; 52 + 53 + /** 54 + * struct xe_mmio_range - register range structure 55 + * 56 + * @start: first register offset in the range. 57 + * @end: last register offset in the range. 58 + */ 59 + struct xe_mmio_range { 60 + u32 start; 61 + u32 end; 62 + }; 63 + 64 + #endif
+1
drivers/gpu/drm/xe/xe_mocs.c
··· 600 600 info->wb_index = 4; 601 601 info->unused_entries_index = 4; 602 602 break; 603 + case XE_NOVALAKE_P: 603 604 case XE_NOVALAKE_S: 604 605 case XE_PANTHERLAKE: 605 606 case XE_LUNARLAKE:
+16 -28
drivers/gpu/drm/xe/xe_module.c
··· 10 10 11 11 #include <drm/drm_module.h> 12 12 13 + #include "xe_defaults.h" 13 14 #include "xe_device_types.h" 14 15 #include "xe_drv.h" 15 16 #include "xe_configfs.h" ··· 20 19 #include "xe_observation.h" 21 20 #include "xe_sched_job.h" 22 21 23 - #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) 24 - #define DEFAULT_GUC_LOG_LEVEL 3 25 - #else 26 - #define DEFAULT_GUC_LOG_LEVEL 1 27 - #endif 28 - 29 - #define DEFAULT_PROBE_DISPLAY true 30 - #define DEFAULT_VRAM_BAR_SIZE 0 31 - #define DEFAULT_FORCE_PROBE CONFIG_DRM_XE_FORCE_PROBE 32 - #define DEFAULT_MAX_VFS ~0 33 - #define DEFAULT_MAX_VFS_STR "unlimited" 34 - #define DEFAULT_WEDGED_MODE XE_WEDGED_MODE_DEFAULT 35 - #define DEFAULT_WEDGED_MODE_STR XE_WEDGED_MODE_DEFAULT_STR 36 - #define DEFAULT_SVM_NOTIFIER_SIZE 512 37 - 38 22 struct xe_modparam xe_modparam = { 39 - .probe_display = DEFAULT_PROBE_DISPLAY, 40 - .guc_log_level = DEFAULT_GUC_LOG_LEVEL, 41 - .force_probe = DEFAULT_FORCE_PROBE, 23 + .probe_display = XE_DEFAULT_PROBE_DISPLAY, 24 + .guc_log_level = XE_DEFAULT_GUC_LOG_LEVEL, 25 + .force_probe = XE_DEFAULT_FORCE_PROBE, 42 26 #ifdef CONFIG_PCI_IOV 43 - .max_vfs = DEFAULT_MAX_VFS, 27 + .max_vfs = XE_DEFAULT_MAX_VFS, 44 28 #endif 45 - .wedged_mode = DEFAULT_WEDGED_MODE, 46 - .svm_notifier_size = DEFAULT_SVM_NOTIFIER_SIZE, 29 + .wedged_mode = XE_DEFAULT_WEDGED_MODE, 30 + .svm_notifier_size = XE_DEFAULT_SVM_NOTIFIER_SIZE, 47 31 /* the rest are 0 by default */ 48 32 }; 49 33 50 34 module_param_named(svm_notifier_size, xe_modparam.svm_notifier_size, uint, 0600); 51 35 MODULE_PARM_DESC(svm_notifier_size, "Set the svm notifier size in MiB, must be power of 2 " 52 - "[default=" __stringify(DEFAULT_SVM_NOTIFIER_SIZE) "]"); 36 + "[default=" __stringify(XE_DEFAULT_SVM_NOTIFIER_SIZE) "]"); 53 37 54 38 module_param_named_unsafe(force_execlist, xe_modparam.force_execlist, bool, 0444); 55 39 MODULE_PARM_DESC(force_execlist, "Force Execlist submission"); 56 40 41 + #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY) 57 42 module_param_named(probe_display, xe_modparam.probe_display, bool, 0444); 58 43 MODULE_PARM_DESC(probe_display, "Probe display HW, otherwise it's left untouched " 59 - "[default=" __stringify(DEFAULT_PROBE_DISPLAY) "])"); 44 + "[default=" __stringify(XE_DEFAULT_PROBE_DISPLAY) "])"); 45 + #endif 60 46 61 47 module_param_named(vram_bar_size, xe_modparam.force_vram_bar_size, int, 0600); 62 48 MODULE_PARM_DESC(vram_bar_size, "Set the vram bar size in MiB (<0=disable-resize, 0=max-needed-size, >0=force-size " 63 - "[default=" __stringify(DEFAULT_VRAM_BAR_SIZE) "])"); 49 + "[default=" __stringify(XE_DEFAULT_VRAM_BAR_SIZE) "])"); 64 50 65 51 module_param_named(guc_log_level, xe_modparam.guc_log_level, int, 0600); 66 52 MODULE_PARM_DESC(guc_log_level, "GuC firmware logging level (0=disable, 1=normal, 2..5=verbose-levels " 67 - "[default=" __stringify(DEFAULT_GUC_LOG_LEVEL) "])"); 53 + "[default=" __stringify(XE_DEFAULT_GUC_LOG_LEVEL) "])"); 68 54 69 55 module_param_named_unsafe(guc_firmware_path, xe_modparam.guc_firmware_path, charp, 0400); 70 56 MODULE_PARM_DESC(guc_firmware_path, ··· 68 80 module_param_named_unsafe(force_probe, xe_modparam.force_probe, charp, 0400); 69 81 MODULE_PARM_DESC(force_probe, 70 82 "Force probe options for specified devices. See CONFIG_DRM_XE_FORCE_PROBE for details " 71 - "[default=" DEFAULT_FORCE_PROBE "])"); 83 + "[default=" XE_DEFAULT_FORCE_PROBE "])"); 72 84 73 85 #ifdef CONFIG_PCI_IOV 74 86 module_param_named(max_vfs, xe_modparam.max_vfs, uint, 0400); 75 87 MODULE_PARM_DESC(max_vfs, 76 88 "Limit number of Virtual Functions (VFs) that could be managed. " 77 89 "(0=no VFs; N=allow up to N VFs " 78 - "[default=" DEFAULT_MAX_VFS_STR "])"); 90 + "[default=" XE_DEFAULT_MAX_VFS_STR "])"); 79 91 #endif 80 92 81 93 module_param_named_unsafe(wedged_mode, xe_modparam.wedged_mode, uint, 0600); 82 94 MODULE_PARM_DESC(wedged_mode, 83 95 "Module's default policy for the wedged mode (0=never, 1=upon-critical-error, 2=upon-any-hang-no-reset " 84 - "[default=" DEFAULT_WEDGED_MODE_STR "])"); 96 + "[default=" XE_DEFAULT_WEDGED_MODE_STR "])"); 85 97 86 98 static int xe_check_nomodeset(void) 87 99 {
+5 -7
drivers/gpu/drm/xe/xe_nvm.c
··· 6 6 #include <linux/intel_dg_nvm_aux.h> 7 7 #include <linux/pci.h> 8 8 9 - #include "xe_device_types.h" 9 + #include "xe_device.h" 10 10 #include "xe_mmio.h" 11 11 #include "xe_nvm.h" 12 12 #include "xe_pcode_api.h" ··· 133 133 if (WARN_ON(xe->nvm)) 134 134 return -EFAULT; 135 135 136 - xe->nvm = kzalloc_obj(*nvm); 137 - if (!xe->nvm) 136 + nvm = kzalloc_obj(*nvm); 137 + if (!nvm) 138 138 return -ENOMEM; 139 - 140 - nvm = xe->nvm; 141 139 142 140 nvm->writable_override = xe_nvm_writable_override(xe); 143 141 nvm->non_posted_erase = xe_nvm_non_posted_erase(xe); ··· 163 165 if (ret) { 164 166 drm_err(&xe->drm, "xe-nvm aux init failed %d\n", ret); 165 167 kfree(nvm); 166 - xe->nvm = NULL; 167 168 return ret; 168 169 } 169 170 ··· 170 173 if (ret) { 171 174 drm_err(&xe->drm, "xe-nvm aux add failed %d\n", ret); 172 175 auxiliary_device_uninit(aux_dev); 173 - xe->nvm = NULL; 174 176 return ret; 175 177 } 178 + 179 + xe->nvm = nvm; 176 180 return devm_add_action_or_reset(xe->drm.dev, xe_nvm_fini, xe); 177 181 }
+2 -22
drivers/gpu/drm/xe/xe_oa.c
··· 29 29 #include "xe_gt.h" 30 30 #include "xe_gt_mcr.h" 31 31 #include "xe_gt_printk.h" 32 - #include "xe_guc_pc.h" 32 + #include "xe_guc_rc.h" 33 33 #include "xe_macros.h" 34 34 #include "xe_mmio.h" 35 35 #include "xe_oa.h" ··· 873 873 xe_force_wake_put(gt_to_fw(gt), stream->fw_ref); 874 874 xe_pm_runtime_put(stream->oa->xe); 875 875 876 - /* Wa_1509372804:pvc: Unset the override of GUCRC mode to enable rc6 */ 877 - if (stream->override_gucrc) 878 - xe_gt_WARN_ON(gt, xe_guc_pc_unset_gucrc_mode(&gt->uc.guc.pc)); 879 - 880 876 xe_oa_free_configs(stream); 881 877 xe_file_put(stream->xef); 882 878 } ··· 965 969 struct xe_oa_fence *ofence = container_of(cb, typeof(*ofence), cb); 966 970 967 971 INIT_DELAYED_WORK(&ofence->work, xe_oa_fence_work_fn); 968 - queue_delayed_work(system_unbound_wq, &ofence->work, 972 + queue_delayed_work(system_dfl_wq, &ofence->work, 969 973 usecs_to_jiffies(NOA_PROGRAM_ADDITIONAL_DELAY_US)); 970 974 dma_fence_put(fence); 971 975 } ··· 1756 1760 goto exit; 1757 1761 } 1758 1762 1759 - /* 1760 - * GuC reset of engines causes OA to lose configuration 1761 - * state. Prevent this by overriding GUCRC mode. 1762 - */ 1763 - if (XE_GT_WA(stream->gt, 1509372804)) { 1764 - ret = xe_guc_pc_override_gucrc_mode(&gt->uc.guc.pc, 1765 - SLPC_GUCRC_MODE_GUCRC_NO_RC6); 1766 - if (ret) 1767 - goto err_free_configs; 1768 - 1769 - stream->override_gucrc = true; 1770 - } 1771 - 1772 1763 /* Take runtime pm ref and forcewake to disable RC6 */ 1773 1764 xe_pm_runtime_get(stream->oa->xe); 1774 1765 stream->fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); ··· 1806 1823 err_fw_put: 1807 1824 xe_force_wake_put(gt_to_fw(gt), stream->fw_ref); 1808 1825 xe_pm_runtime_put(stream->oa->xe); 1809 - if (stream->override_gucrc) 1810 - xe_gt_WARN_ON(gt, xe_guc_pc_unset_gucrc_mode(&gt->uc.guc.pc)); 1811 - err_free_configs: 1812 1826 xe_oa_free_configs(stream); 1813 1827 exit: 1814 1828 xe_file_put(stream->xef);
-3
drivers/gpu/drm/xe/xe_oa_types.h
··· 239 239 /** @poll_period_ns: hrtimer period for checking OA buffer for available data */ 240 240 u64 poll_period_ns; 241 241 242 - /** @override_gucrc: GuC RC has been overridden for the OA stream */ 243 - bool override_gucrc; 244 - 245 242 /** @oa_status: temporary storage for oa_status register value */ 246 243 u32 oa_status; 247 244
+20 -11
drivers/gpu/drm/xe/xe_pagefault.c
··· 136 136 static bool 137 137 xe_pagefault_access_is_atomic(enum xe_pagefault_access_type access_type) 138 138 { 139 - return access_type == XE_PAGEFAULT_ACCESS_TYPE_ATOMIC; 139 + return (access_type & XE_PAGEFAULT_ACCESS_TYPE_MASK) == XE_PAGEFAULT_ACCESS_TYPE_ATOMIC; 140 140 } 141 141 142 142 static struct xe_vm *xe_pagefault_asid_to_vm(struct xe_device *xe, u32 asid) ··· 164 164 bool atomic; 165 165 166 166 /* Producer flagged this fault to be nacked */ 167 - if (pf->consumer.fault_level == XE_PAGEFAULT_LEVEL_NACK) 167 + if (pf->consumer.fault_type_level == XE_PAGEFAULT_TYPE_LEVEL_NACK) 168 168 return -EFAULT; 169 169 170 170 vm = xe_pagefault_asid_to_vm(xe, pf->consumer.asid); ··· 225 225 { 226 226 xe_gt_info(pf->gt, "\n\tASID: %d\n" 227 227 "\tFaulted Address: 0x%08x%08x\n" 228 - "\tFaultType: %d\n" 229 - "\tAccessType: %d\n" 230 - "\tFaultLevel: %d\n" 228 + "\tFaultType: %lu\n" 229 + "\tAccessType: %lu\n" 230 + "\tFaultLevel: %lu\n" 231 231 "\tEngineClass: %d %s\n" 232 232 "\tEngineInstance: %d\n", 233 233 pf->consumer.asid, 234 234 upper_32_bits(pf->consumer.page_addr), 235 235 lower_32_bits(pf->consumer.page_addr), 236 - pf->consumer.fault_type, 237 - pf->consumer.access_type, 238 - pf->consumer.fault_level, 236 + FIELD_GET(XE_PAGEFAULT_TYPE_MASK, 237 + pf->consumer.fault_type_level), 238 + FIELD_GET(XE_PAGEFAULT_ACCESS_TYPE_MASK, 239 + pf->consumer.access_type), 240 + FIELD_GET(XE_PAGEFAULT_LEVEL_MASK, 241 + pf->consumer.fault_type_level), 239 242 pf->consumer.engine_class, 240 243 xe_hw_engine_class_to_str(pf->consumer.engine_class), 241 244 pf->consumer.engine_instance); ··· 262 259 263 260 err = xe_pagefault_service(&pf); 264 261 if (err) { 265 - xe_pagefault_print(&pf); 266 - xe_gt_info(pf.gt, "Fault response: Unsuccessful %pe\n", 267 - ERR_PTR(err)); 262 + if (!(pf.consumer.access_type & XE_PAGEFAULT_ACCESS_PREFETCH)) { 263 + xe_pagefault_print(&pf); 264 + xe_gt_info(pf.gt, "Fault response: Unsuccessful %pe\n", 265 + ERR_PTR(err)); 266 + } else { 267 + xe_gt_stats_incr(pf.gt, XE_GT_STATS_ID_INVALID_PREFETCH_PAGEFAULT_COUNT, 1); 268 + xe_gt_dbg(pf.gt, "Prefetch Fault response: Unsuccessful %pe\n", 269 + ERR_PTR(err)); 270 + } 268 271 } 269 272 270 273 pf.producer.ops->ack_fault(&pf, err);
+11 -9
drivers/gpu/drm/xe/xe_pagefault_types.h
··· 68 68 /** @consumer.asid: address space ID */ 69 69 u32 asid; 70 70 /** 71 - * @consumer.access_type: access type, u8 rather than enum to 72 - * keep size compact 71 + * @consumer.access_type: access type and prefetch flag packed 72 + * into a u8. 73 73 */ 74 74 u8 access_type; 75 + #define XE_PAGEFAULT_ACCESS_TYPE_MASK GENMASK(1, 0) 76 + #define XE_PAGEFAULT_ACCESS_PREFETCH BIT(7) 75 77 /** 76 - * @consumer.fault_type: fault type, u8 rather than enum to 77 - * keep size compact 78 + * @consumer.fault_type_level: fault type and level, u8 rather 79 + * than enum to keep size compact 78 80 */ 79 - u8 fault_type; 80 - #define XE_PAGEFAULT_LEVEL_NACK 0xff /* Producer indicates nack fault */ 81 - /** @consumer.fault_level: fault level */ 82 - u8 fault_level; 81 + u8 fault_type_level; 82 + #define XE_PAGEFAULT_TYPE_LEVEL_NACK 0xff /* Producer indicates nack fault */ 83 + #define XE_PAGEFAULT_LEVEL_MASK GENMASK(3, 0) 84 + #define XE_PAGEFAULT_TYPE_MASK GENMASK(7, 4) 83 85 /** @consumer.engine_class: engine class */ 84 86 u8 engine_class; 85 87 /** @consumer.engine_instance: engine instance */ 86 88 u8 engine_instance; 87 89 /** consumer.reserved: reserved bits for future expansion */ 88 - u8 reserved[7]; 90 + u64 reserved; 89 91 } consumer; 90 92 /** 91 93 * @producer: State for the producer (i.e., HW/FW interface). Populated
+88 -61
drivers/gpu/drm/xe/xe_pat.c
··· 88 88 void (*program_media)(struct xe_gt *gt, const struct xe_pat_table_entry table[], 89 89 int n_entries); 90 90 int (*dump)(struct xe_gt *gt, struct drm_printer *p); 91 + void (*entry_dump)(struct drm_printer *p, const char *label, u32 pat, bool rsvd); 91 92 }; 92 93 93 94 static const struct xe_pat_table_entry xelp_pat_table[] = { ··· 124 123 * - no_promote: 0=promotable, 1=no promote 125 124 * - comp_en: 0=disable, 1=enable 126 125 * - l3clos: L3 class of service (0-3) 127 - * - l3_policy: 0=WB, 1=XD ("WB - Transient Display"), 3=UC 126 + * - l3_policy: 0=WB, 1=XD ("WB - Transient Display"), 127 + * 2=XA ("WB - Transient App" for Xe3p), 3=UC 128 128 * - l4_policy: 0=WB, 1=WT, 3=UC 129 129 * - coh_mode: 0=no snoop, 2=1-way coherent, 3=2-way coherent 130 130 * ··· 254 252 [31] = XE3P_XPC_PAT( 0, 3, 0, 0, 3 ), 255 253 }; 256 254 255 + static const struct xe_pat_table_entry xe3p_primary_pat_pta = XE2_PAT(0, 0, 0, 0, 0, 3); 256 + static const struct xe_pat_table_entry xe3p_media_pat_pta = XE2_PAT(0, 0, 0, 0, 0, 2); 257 + 258 + static const struct xe_pat_table_entry xe3p_lpg_pat_table[] = { 259 + [ 0] = XE2_PAT( 0, 0, 0, 0, 3, 0 ), 260 + [ 1] = XE2_PAT( 0, 0, 0, 0, 3, 2 ), 261 + [ 2] = XE2_PAT( 0, 0, 0, 0, 3, 3 ), 262 + [ 3] = XE2_PAT( 0, 0, 0, 3, 3, 0 ), 263 + [ 4] = XE2_PAT( 0, 0, 0, 3, 0, 2 ), 264 + [ 5] = XE2_PAT( 0, 0, 0, 3, 3, 2 ), 265 + [ 6] = XE2_PAT( 1, 0, 0, 1, 3, 0 ), 266 + [ 7] = XE2_PAT( 0, 0, 0, 3, 0, 3 ), 267 + [ 8] = XE2_PAT( 0, 0, 0, 3, 0, 0 ), 268 + [ 9] = XE2_PAT( 0, 1, 0, 0, 3, 0 ), 269 + [10] = XE2_PAT( 0, 1, 0, 3, 0, 0 ), 270 + [11] = XE2_PAT( 1, 1, 0, 1, 3, 0 ), 271 + [12] = XE2_PAT( 0, 1, 0, 3, 3, 0 ), 272 + [13] = XE2_PAT( 0, 0, 0, 0, 0, 0 ), 273 + [14] = XE2_PAT( 0, 1, 0, 0, 0, 0 ), 274 + [15] = XE2_PAT( 1, 1, 0, 1, 1, 0 ), 275 + [16] = XE2_PAT( 0, 1, 0, 0, 3, 2 ), 276 + /* 17 is reserved; leave set to all 0's */ 277 + [18] = XE2_PAT( 1, 0, 0, 2, 3, 0 ), 278 + [19] = XE2_PAT( 1, 0, 0, 2, 3, 2 ), 279 + [20] = XE2_PAT( 0, 0, 1, 0, 3, 0 ), 280 + [21] = XE2_PAT( 0, 1, 1, 0, 3, 0 ), 281 + [22] = XE2_PAT( 0, 0, 1, 0, 3, 2 ), 282 + [23] = XE2_PAT( 0, 0, 1, 0, 3, 3 ), 283 + [24] = XE2_PAT( 0, 0, 2, 0, 3, 0 ), 284 + [25] = XE2_PAT( 0, 1, 2, 0, 3, 0 ), 285 + [26] = XE2_PAT( 0, 0, 2, 0, 3, 2 ), 286 + [27] = XE2_PAT( 0, 0, 2, 0, 3, 3 ), 287 + [28] = XE2_PAT( 0, 0, 3, 0, 3, 0 ), 288 + [29] = XE2_PAT( 0, 1, 3, 0, 3, 0 ), 289 + [30] = XE2_PAT( 0, 0, 3, 0, 3, 2 ), 290 + [31] = XE2_PAT( 0, 0, 3, 0, 3, 3 ), 291 + }; 292 + 257 293 u16 xe_pat_index_get_coh_mode(struct xe_device *xe, u16 pat_index) 258 294 { 259 295 WARN_ON(pat_index >= xe->pat.n_entries); ··· 324 284 325 285 if (xe->pat.pat_ats) 326 286 xe_mmio_write32(&gt->mmio, XE_REG(_PAT_ATS), xe->pat.pat_ats->value); 327 - if (xe->pat.pat_pta) 328 - xe_mmio_write32(&gt->mmio, XE_REG(_PAT_PTA), xe->pat.pat_pta->value); 287 + if (xe->pat.pat_primary_pta && xe_gt_is_main_type(gt)) 288 + xe_mmio_write32(&gt->mmio, XE_REG(_PAT_PTA), xe->pat.pat_primary_pta->value); 289 + if (xe->pat.pat_media_pta && xe_gt_is_media_type(gt)) 290 + xe_mmio_write32(&gt->mmio, XE_REG(_PAT_PTA), xe->pat.pat_media_pta->value); 329 291 } 330 292 331 293 static void program_pat_mcr(struct xe_gt *gt, const struct xe_pat_table_entry table[], ··· 343 301 344 302 if (xe->pat.pat_ats) 345 303 xe_gt_mcr_multicast_write(gt, XE_REG_MCR(_PAT_ATS), xe->pat.pat_ats->value); 346 - if (xe->pat.pat_pta) 347 - xe_gt_mcr_multicast_write(gt, XE_REG_MCR(_PAT_PTA), xe->pat.pat_pta->value); 304 + if (xe->pat.pat_primary_pta && xe_gt_is_main_type(gt)) 305 + xe_gt_mcr_multicast_write(gt, XE_REG_MCR(_PAT_PTA), xe->pat.pat_primary_pta->value); 306 + if (xe->pat.pat_media_pta && xe_gt_is_media_type(gt)) 307 + xe_gt_mcr_multicast_write(gt, XE_REG_MCR(_PAT_PTA), xe->pat.pat_media_pta->value); 348 308 } 349 309 350 310 static int xelp_dump(struct xe_gt *gt, struct drm_printer *p) ··· 502 458 pat = xe_gt_mcr_unicast_read_any(gt, XE_REG_MCR(_PAT_INDEX(i))); 503 459 504 460 xe_pat_index_label(label, sizeof(label), i); 505 - xe2_pat_entry_dump(p, label, pat, !xe->pat.table[i].valid); 461 + xe->pat.ops->entry_dump(p, label, pat, !xe->pat.table[i].valid); 506 462 } 507 463 508 464 /* ··· 515 471 pat = xe_gt_mcr_unicast_read_any(gt, XE_REG_MCR(_PAT_PTA)); 516 472 517 473 drm_printf(p, "Page Table Access:\n"); 518 - xe2_pat_entry_dump(p, "PTA_MODE", pat, false); 474 + xe->pat.ops->entry_dump(p, "PTA_MODE", pat, false); 519 475 520 476 return 0; 521 477 } ··· 524 480 .program_graphics = program_pat_mcr, 525 481 .program_media = program_pat, 526 482 .dump = xe2_dump, 483 + .entry_dump = xe2_pat_entry_dump, 527 484 }; 528 - 529 - static int xe3p_xpc_dump(struct xe_gt *gt, struct drm_printer *p) 530 - { 531 - struct xe_device *xe = gt_to_xe(gt); 532 - u32 pat; 533 - int i; 534 - char label[PAT_LABEL_LEN]; 535 - 536 - CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 537 - if (!fw_ref.domains) 538 - return -ETIMEDOUT; 539 - 540 - drm_printf(p, "PAT table: (* = reserved entry)\n"); 541 - 542 - for (i = 0; i < xe->pat.n_entries; i++) { 543 - pat = xe_gt_mcr_unicast_read_any(gt, XE_REG_MCR(_PAT_INDEX(i))); 544 - 545 - xe_pat_index_label(label, sizeof(label), i); 546 - xe3p_xpc_pat_entry_dump(p, label, pat, !xe->pat.table[i].valid); 547 - } 548 - 549 - /* 550 - * Also print PTA_MODE, which describes how the hardware accesses 551 - * PPGTT entries. 552 - */ 553 - pat = xe_gt_mcr_unicast_read_any(gt, XE_REG_MCR(_PAT_PTA)); 554 - 555 - drm_printf(p, "Page Table Access:\n"); 556 - xe3p_xpc_pat_entry_dump(p, "PTA_MODE", pat, false); 557 - 558 - return 0; 559 - } 560 485 561 486 static const struct xe_pat_ops xe3p_xpc_pat_ops = { 562 487 .program_graphics = program_pat_mcr, 563 488 .program_media = program_pat, 564 - .dump = xe3p_xpc_dump, 489 + .dump = xe2_dump, 490 + .entry_dump = xe3p_xpc_pat_entry_dump, 565 491 }; 566 492 567 493 void xe_pat_init_early(struct xe_device *xe) ··· 541 527 xe->pat.ops = &xe3p_xpc_pat_ops; 542 528 xe->pat.table = xe3p_xpc_pat_table; 543 529 xe->pat.pat_ats = &xe3p_xpc_pat_ats; 544 - xe->pat.pat_pta = &xe3p_xpc_pat_pta; 530 + xe->pat.pat_primary_pta = &xe3p_xpc_pat_pta; 531 + xe->pat.pat_media_pta = &xe3p_xpc_pat_pta; 545 532 xe->pat.n_entries = ARRAY_SIZE(xe3p_xpc_pat_table); 546 533 xe->pat.idx[XE_CACHE_NONE] = 3; 547 534 xe->pat.idx[XE_CACHE_WT] = 3; /* N/A (no display); use UC */ 548 535 xe->pat.idx[XE_CACHE_WB] = 2; 536 + } else if (GRAPHICS_VER(xe) == 35) { 537 + xe->pat.ops = &xe2_pat_ops; 538 + xe->pat.table = xe3p_lpg_pat_table; 539 + xe->pat.pat_ats = &xe2_pat_ats; 540 + if (!IS_DGFX(xe)) { 541 + xe->pat.pat_primary_pta = &xe3p_primary_pat_pta; 542 + xe->pat.pat_media_pta = &xe3p_media_pat_pta; 543 + } 544 + xe->pat.n_entries = ARRAY_SIZE(xe3p_lpg_pat_table); 545 + xe->pat.idx[XE_CACHE_NONE] = 3; 546 + xe->pat.idx[XE_CACHE_WT] = 15; 547 + xe->pat.idx[XE_CACHE_WB] = 2; 548 + xe->pat.idx[XE_CACHE_NONE_COMPRESSION] = 12; 549 + xe->pat.idx[XE_CACHE_WB_COMPRESSION] = 16; 549 550 } else if (GRAPHICS_VER(xe) == 30 || GRAPHICS_VER(xe) == 20) { 550 551 xe->pat.ops = &xe2_pat_ops; 551 552 if (GRAPHICS_VER(xe) == 30) { ··· 570 541 xe->pat.table = xe2_pat_table; 571 542 } 572 543 xe->pat.pat_ats = &xe2_pat_ats; 573 - if (IS_DGFX(xe)) 574 - xe->pat.pat_pta = &xe2_pat_pta; 544 + if (IS_DGFX(xe)) { 545 + xe->pat.pat_primary_pta = &xe2_pat_pta; 546 + xe->pat.pat_media_pta = &xe2_pat_pta; 547 + } 575 548 576 549 /* Wa_16023588340. XXX: Should use XE_WA */ 577 550 if (GRAPHICS_VERx100(xe) == 2001) ··· 631 600 GRAPHICS_VER(xe), GRAPHICS_VERx100(xe) % 100); 632 601 } 633 602 634 - /* VFs can't program nor dump PAT settings */ 635 - if (IS_SRIOV_VF(xe)) 636 - xe->pat.ops = NULL; 637 - 638 - xe_assert(xe, !xe->pat.ops || xe->pat.ops->dump); 639 - xe_assert(xe, !xe->pat.ops || xe->pat.ops->program_graphics); 640 - xe_assert(xe, !xe->pat.ops || MEDIA_VER(xe) < 13 || xe->pat.ops->program_media); 603 + xe_assert(xe, xe->pat.ops->dump); 604 + xe_assert(xe, xe->pat.ops->program_graphics); 605 + xe_assert(xe, MEDIA_VER(xe) < 13 || xe->pat.ops->program_media); 606 + xe_assert(xe, GRAPHICS_VER(xe) < 20 || xe->pat.ops->entry_dump); 641 607 } 642 608 643 609 void xe_pat_init(struct xe_gt *gt) 644 610 { 645 611 struct xe_device *xe = gt_to_xe(gt); 646 612 647 - if (!xe->pat.ops) 613 + if (IS_SRIOV_VF(xe)) 648 614 return; 649 615 650 616 if (xe_gt_is_media_type(gt)) ··· 661 633 { 662 634 struct xe_device *xe = gt_to_xe(gt); 663 635 664 - if (!xe->pat.ops) 636 + if (IS_SRIOV_VF(xe)) 665 637 return -EOPNOTSUPP; 666 638 667 639 return xe->pat.ops->dump(gt, p); ··· 677 649 int xe_pat_dump_sw_config(struct xe_gt *gt, struct drm_printer *p) 678 650 { 679 651 struct xe_device *xe = gt_to_xe(gt); 652 + const struct xe_pat_table_entry *pta_entry = xe_gt_is_main_type(gt) ? 653 + xe->pat.pat_primary_pta : xe->pat.pat_media_pta; 680 654 char label[PAT_LABEL_LEN]; 681 655 682 656 if (!xe->pat.table || !xe->pat.n_entries) ··· 688 658 for (u32 i = 0; i < xe->pat.n_entries; i++) { 689 659 u32 pat = xe->pat.table[i].value; 690 660 691 - if (GRAPHICS_VERx100(xe) == 3511) { 661 + if (GRAPHICS_VER(xe) >= 20) { 692 662 xe_pat_index_label(label, sizeof(label), i); 693 - xe3p_xpc_pat_entry_dump(p, label, pat, !xe->pat.table[i].valid); 694 - } else if (GRAPHICS_VER(xe) == 30 || GRAPHICS_VER(xe) == 20) { 695 - xe_pat_index_label(label, sizeof(label), i); 696 - xe2_pat_entry_dump(p, label, pat, !xe->pat.table[i].valid); 663 + xe->pat.ops->entry_dump(p, label, pat, !xe->pat.table[i].valid); 697 664 } else if (xe->info.platform == XE_METEORLAKE) { 698 665 xelpg_pat_entry_dump(p, i, pat); 699 666 } else if (xe->info.platform == XE_PVC) { ··· 702 675 } 703 676 } 704 677 705 - if (xe->pat.pat_pta) { 706 - u32 pat = xe->pat.pat_pta->value; 678 + if (pta_entry) { 679 + u32 pat = pta_entry->value; 707 680 708 681 drm_printf(p, "Page Table Access:\n"); 709 - xe2_pat_entry_dump(p, "PTA_MODE", pat, false); 682 + xe->pat.ops->entry_dump(p, "PTA_MODE", pat, false); 710 683 } 711 684 712 685 if (xe->pat.pat_ats) { 713 686 u32 pat = xe->pat.pat_ats->value; 714 687 715 688 drm_printf(p, "PCIe ATS/PASID:\n"); 716 - xe2_pat_entry_dump(p, "PAT_ATS ", pat, false); 689 + xe->pat.ops->entry_dump(p, "PAT_ATS ", pat, false); 717 690 } 718 691 719 692 drm_printf(p, "Cache Level:\n");
+59
drivers/gpu/drm/xe/xe_pci.c
··· 52 52 53 53 static const struct xe_graphics_desc graphics_xelp = { 54 54 .hw_engine_mask = BIT(XE_HW_ENGINE_RCS0) | BIT(XE_HW_ENGINE_BCS0), 55 + .num_geometry_xecore_fuse_regs = 1, 55 56 }; 56 57 57 58 #define XE_HP_FEATURES \ ··· 63 62 BIT(XE_HW_ENGINE_RCS0) | BIT(XE_HW_ENGINE_BCS0) | 64 63 BIT(XE_HW_ENGINE_CCS0) | BIT(XE_HW_ENGINE_CCS1) | 65 64 BIT(XE_HW_ENGINE_CCS2) | BIT(XE_HW_ENGINE_CCS3), 65 + .num_geometry_xecore_fuse_regs = 1, 66 + .num_compute_xecore_fuse_regs = 1, 66 67 67 68 XE_HP_FEATURES, 68 69 }; ··· 84 81 .has_asid = 1, 85 82 .has_atomic_enable_pte_bit = 1, 86 83 .has_usm = 1, 84 + .num_compute_xecore_fuse_regs = 2, 87 85 }; 88 86 89 87 static const struct xe_graphics_desc graphics_xelpg = { 90 88 .hw_engine_mask = 91 89 BIT(XE_HW_ENGINE_RCS0) | BIT(XE_HW_ENGINE_BCS0) | 92 90 BIT(XE_HW_ENGINE_CCS0), 91 + .num_geometry_xecore_fuse_regs = 1, 92 + .num_compute_xecore_fuse_regs = 1, 93 93 94 94 XE_HP_FEATURES, 95 95 }; ··· 110 104 111 105 static const struct xe_graphics_desc graphics_xe2 = { 112 106 XE2_GFX_FEATURES, 107 + .num_geometry_xecore_fuse_regs = 3, 108 + .num_compute_xecore_fuse_regs = 3, 109 + }; 110 + 111 + static const struct xe_graphics_desc graphics_xe3p_lpg = { 112 + XE2_GFX_FEATURES, 113 + .multi_queue_engine_class_mask = BIT(XE_ENGINE_CLASS_COPY) | BIT(XE_ENGINE_CLASS_COMPUTE), 114 + .num_geometry_xecore_fuse_regs = 3, 115 + .num_compute_xecore_fuse_regs = 3, 113 116 }; 114 117 115 118 static const struct xe_graphics_desc graphics_xe3p_xpc = { ··· 127 112 .hw_engine_mask = 128 113 GENMASK(XE_HW_ENGINE_BCS8, XE_HW_ENGINE_BCS1) | 129 114 GENMASK(XE_HW_ENGINE_CCS3, XE_HW_ENGINE_CCS0), 115 + .multi_queue_engine_class_mask = BIT(XE_ENGINE_CLASS_COPY) | 116 + BIT(XE_ENGINE_CLASS_COMPUTE), 117 + .num_geometry_xecore_fuse_regs = 4, 118 + .num_compute_xecore_fuse_regs = 4, 130 119 }; 131 120 132 121 static const struct xe_media_desc media_xem = { ··· 165 146 { 3003, "Xe3_LPG", &graphics_xe2 }, 166 147 { 3004, "Xe3_LPG", &graphics_xe2 }, 167 148 { 3005, "Xe3_LPG", &graphics_xe2 }, 149 + { 3510, "Xe3p_LPG", &graphics_xe3p_lpg }, 168 150 { 3511, "Xe3p_XPC", &graphics_xe3p_xpc }, 169 151 }; 170 152 ··· 184 164 { 3503, "Xe3p_HPM", &media_xelpmp }, 185 165 }; 186 166 167 + #define MULTI_LRC_MASK \ 168 + .multi_lrc_mask = BIT(XE_ENGINE_CLASS_VIDEO_DECODE) | \ 169 + BIT(XE_ENGINE_CLASS_VIDEO_ENHANCE) 170 + 187 171 static const struct xe_device_desc tgl_desc = { 188 172 .pre_gmdid_graphics_ip = &graphics_ip_xelp, 189 173 .pre_gmdid_media_ip = &media_ip_xem, ··· 198 174 .has_llc = true, 199 175 .has_sriov = true, 200 176 .max_gt_per_tile = 1, 177 + MULTI_LRC_MASK, 201 178 .require_force_probe = true, 202 179 .va_bits = 48, 203 180 .vm_max_level = 3, ··· 213 188 .has_display = true, 214 189 .has_llc = true, 215 190 .max_gt_per_tile = 1, 191 + MULTI_LRC_MASK, 216 192 .require_force_probe = true, 217 193 .va_bits = 48, 218 194 .vm_max_level = 3, ··· 231 205 .has_llc = true, 232 206 .has_sriov = true, 233 207 .max_gt_per_tile = 1, 208 + MULTI_LRC_MASK, 234 209 .require_force_probe = true, 235 210 .subplatforms = (const struct xe_subplatform_desc[]) { 236 211 { XE_SUBPLATFORM_ALDERLAKE_S_RPLS, "RPLS", adls_rpls_ids }, ··· 253 226 .has_llc = true, 254 227 .has_sriov = true, 255 228 .max_gt_per_tile = 1, 229 + MULTI_LRC_MASK, 256 230 .require_force_probe = true, 257 231 .subplatforms = (const struct xe_subplatform_desc[]) { 258 232 { XE_SUBPLATFORM_ALDERLAKE_P_RPLU, "RPLU", adlp_rplu_ids }, ··· 273 245 .has_llc = true, 274 246 .has_sriov = true, 275 247 .max_gt_per_tile = 1, 248 + MULTI_LRC_MASK, 276 249 .require_force_probe = true, 277 250 .va_bits = 48, 278 251 .vm_max_level = 3, ··· 292 263 .has_gsc_nvm = 1, 293 264 .has_heci_gscfi = 1, 294 265 .max_gt_per_tile = 1, 266 + MULTI_LRC_MASK, 295 267 .require_force_probe = true, 296 268 .va_bits = 48, 297 269 .vm_max_level = 3, ··· 323 293 .pre_gmdid_media_ip = &media_ip_xehpm, 324 294 .dma_mask_size = 46, 325 295 .max_gt_per_tile = 1, 296 + MULTI_LRC_MASK, 326 297 .require_force_probe = true, 327 298 328 299 DG2_FEATURES, ··· 336 305 .pre_gmdid_media_ip = &media_ip_xehpm, 337 306 .dma_mask_size = 46, 338 307 .max_gt_per_tile = 1, 308 + MULTI_LRC_MASK, 339 309 .require_force_probe = true, 340 310 341 311 DG2_FEATURES, ··· 355 323 .has_heci_gscfi = 1, 356 324 .max_gt_per_tile = 1, 357 325 .max_remote_tiles = 1, 326 + MULTI_LRC_MASK, 358 327 .require_force_probe = true, 359 328 .va_bits = 57, 360 329 .vm_max_level = 4, ··· 371 338 .has_display = true, 372 339 .has_pxp = true, 373 340 .max_gt_per_tile = 2, 341 + MULTI_LRC_MASK, 374 342 .va_bits = 48, 375 343 .vm_max_level = 3, 376 344 }; ··· 383 349 .has_flat_ccs = 1, 384 350 .has_pxp = true, 385 351 .max_gt_per_tile = 2, 352 + MULTI_LRC_MASK, 386 353 .needs_scratch = true, 387 354 .va_bits = 48, 388 355 .vm_max_level = 4, ··· 408 373 .has_soc_remapper_telem = true, 409 374 .has_sriov = true, 410 375 .max_gt_per_tile = 2, 376 + MULTI_LRC_MASK, 411 377 .needs_scratch = true, 412 378 .subplatforms = (const struct xe_subplatform_desc[]) { 413 379 { XE_SUBPLATFORM_BATTLEMAGE_G21, "G21", bmg_g21_ids }, ··· 427 391 .has_pre_prod_wa = 1, 428 392 .has_pxp = true, 429 393 .max_gt_per_tile = 2, 394 + MULTI_LRC_MASK, 430 395 .needs_scratch = true, 431 396 .needs_shared_vf_gt_wq = true, 432 397 .va_bits = 48, ··· 441 404 .has_flat_ccs = 1, 442 405 .has_pre_prod_wa = 1, 443 406 .max_gt_per_tile = 2, 407 + MULTI_LRC_MASK, 444 408 .require_force_probe = true, 445 409 .va_bits = 48, 446 410 .vm_max_level = 4, ··· 463 425 .has_soc_remapper_telem = true, 464 426 .has_sriov = true, 465 427 .max_gt_per_tile = 2, 428 + MULTI_LRC_MASK, 466 429 .require_force_probe = true, 467 430 .va_bits = 57, 431 + .vm_max_level = 4, 432 + }; 433 + 434 + static const struct xe_device_desc nvlp_desc = { 435 + PLATFORM(NOVALAKE_P), 436 + .dma_mask_size = 46, 437 + .has_cached_pt = true, 438 + .has_display = true, 439 + .has_flat_ccs = 1, 440 + .has_page_reclaim_hw_assist = true, 441 + .has_pre_prod_wa = true, 442 + .max_gt_per_tile = 2, 443 + MULTI_LRC_MASK, 444 + .require_force_probe = true, 445 + .va_bits = 48, 468 446 .vm_max_level = 4, 469 447 }; 470 448 ··· 513 459 INTEL_WCL_IDS(INTEL_VGA_DEVICE, &ptl_desc), 514 460 INTEL_NVLS_IDS(INTEL_VGA_DEVICE, &nvls_desc), 515 461 INTEL_CRI_IDS(INTEL_PCI_DEVICE, &cri_desc), 462 + INTEL_NVLP_IDS(INTEL_VGA_DEVICE, &nvlp_desc), 516 463 { } 517 464 }; 518 465 MODULE_DEVICE_TABLE(pci, pciidlist); ··· 765 710 xe->info.skip_pcode = desc->skip_pcode; 766 711 xe->info.needs_scratch = desc->needs_scratch; 767 712 xe->info.needs_shared_vf_gt_wq = desc->needs_shared_vf_gt_wq; 713 + xe->info.multi_lrc_mask = desc->multi_lrc_mask; 768 714 769 715 xe->info.probe_display = IS_ENABLED(CONFIG_DRM_XE_DISPLAY) && 770 716 xe_modparam.probe_display && ··· 842 786 gt->info.has_indirect_ring_state = graphics_desc->has_indirect_ring_state; 843 787 gt->info.multi_queue_engine_class_mask = graphics_desc->multi_queue_engine_class_mask; 844 788 gt->info.engine_mask = graphics_desc->hw_engine_mask; 789 + gt->info.num_geometry_xecore_fuse_regs = graphics_desc->num_geometry_xecore_fuse_regs; 790 + gt->info.num_compute_xecore_fuse_regs = graphics_desc->num_compute_xecore_fuse_regs; 845 791 846 792 /* 847 793 * Before media version 13, the media IP was part of the primary GT ··· 950 892 xe->info.has_device_atomics_on_smem = 1; 951 893 952 894 xe->info.has_range_tlb_inval = graphics_desc->has_range_tlb_inval; 895 + xe->info.has_ctx_tlb_inval = graphics_desc->has_ctx_tlb_inval; 953 896 xe->info.has_usm = graphics_desc->has_usm; 954 897 xe->info.has_64bit_timestamp = graphics_desc->has_64bit_timestamp; 955 898 xe->info.has_mem_copy_instr = GRAPHICS_VER(xe) >= 20;
+4
drivers/gpu/drm/xe/xe_pci_types.h
··· 30 30 u8 dma_mask_size; 31 31 u8 max_remote_tiles:2; 32 32 u8 max_gt_per_tile:2; 33 + u8 multi_lrc_mask; 33 34 u8 va_bits; 34 35 u8 vm_max_level; 35 36 u8 vram_flags; ··· 67 66 struct xe_graphics_desc { 68 67 u64 hw_engine_mask; /* hardware engines provided by graphics IP */ 69 68 u16 multi_queue_engine_class_mask; /* bitmask of engine classes which support multi queue */ 69 + u8 num_geometry_xecore_fuse_regs; 70 + u8 num_compute_xecore_fuse_regs; 70 71 71 72 u8 has_asid:1; 72 73 u8 has_atomic_enable_pte_bit:1; 73 74 u8 has_indirect_ring_state:1; 74 75 u8 has_range_tlb_inval:1; 76 + u8 has_ctx_tlb_inval:1; 75 77 u8 has_usm:1; 76 78 u8 has_64bit_timestamp:1; 77 79 };
+1
drivers/gpu/drm/xe/xe_platform_types.h
··· 26 26 XE_PANTHERLAKE, 27 27 XE_NOVALAKE_S, 28 28 XE_CRESCENTISLAND, 29 + XE_NOVALAKE_P, 29 30 }; 30 31 31 32 enum xe_subplatform {
-3
drivers/gpu/drm/xe/xe_query.c
··· 142 142 return -EINVAL; 143 143 144 144 eci = &resp.eci; 145 - if (eci->gt_id >= xe->info.max_gt_per_tile) 146 - return -EINVAL; 147 - 148 145 gt = xe_device_get_gt(xe, eci->gt_id); 149 146 if (!gt) 150 147 return -EINVAL;
+73 -3
drivers/gpu/drm/xe/xe_reg_sr.c
··· 13 13 #include <drm/drm_managed.h> 14 14 #include <drm/drm_print.h> 15 15 16 + #include "xe_assert.h" 16 17 #include "xe_device.h" 17 18 #include "xe_device_types.h" 18 19 #include "xe_force_wake.h" ··· 21 20 #include "xe_gt_printk.h" 22 21 #include "xe_gt_types.h" 23 22 #include "xe_hw_engine_types.h" 23 + #include "xe_lrc.h" 24 24 #include "xe_mmio.h" 25 25 #include "xe_rtp_types.h" 26 26 ··· 100 98 *pentry = *e; 101 99 ret = xa_err(xa_store(&sr->xa, idx, pentry, GFP_KERNEL)); 102 100 if (ret) 103 - goto fail; 101 + goto fail_free; 104 102 105 103 return 0; 106 104 105 + fail_free: 106 + kfree(pentry); 107 107 fail: 108 108 xe_gt_err(gt, 109 109 "discarding save-restore reg %04lx (clear: %08x, set: %08x, masked: %s, mcr: %s): ret=%d\n", ··· 173 169 if (xa_empty(&sr->xa)) 174 170 return; 175 171 176 - if (IS_SRIOV_VF(gt_to_xe(gt))) 177 - return; 172 + /* 173 + * We don't process non-LRC reg_sr lists in VF, so they should have 174 + * been empty in the check above. 175 + */ 176 + xe_gt_assert(gt, !IS_SRIOV_VF(gt_to_xe(gt))); 178 177 179 178 xe_gt_dbg(gt, "Applying %s save-restore MMIOs\n", sr->name); 180 179 ··· 210 203 reg, entry->clr_bits, entry->set_bits, 211 204 str_yes_no(entry->reg.masked), 212 205 str_yes_no(entry->reg.mcr)); 206 + } 207 + 208 + static u32 readback_reg(struct xe_gt *gt, struct xe_reg reg) 209 + { 210 + struct xe_reg_mcr mcr_reg = to_xe_reg_mcr(reg); 211 + 212 + if (reg.mcr) 213 + return xe_gt_mcr_unicast_read_any(gt, mcr_reg); 214 + else 215 + return xe_mmio_read32(&gt->mmio, reg); 216 + } 217 + 218 + /** 219 + * xe_reg_sr_readback_check() - Readback registers referenced in save/restore 220 + * entries and check whether the programming is in place. 221 + * @sr: Save/restore entries 222 + * @gt: GT to read register from 223 + * @p: DRM printer to report discrepancies on 224 + */ 225 + void xe_reg_sr_readback_check(struct xe_reg_sr *sr, 226 + struct xe_gt *gt, 227 + struct drm_printer *p) 228 + { 229 + struct xe_reg_sr_entry *entry; 230 + unsigned long offset; 231 + 232 + xa_for_each(&sr->xa, offset, entry) { 233 + u32 val = readback_reg(gt, entry->reg); 234 + u32 mask = entry->clr_bits | entry->set_bits; 235 + 236 + if ((val & mask) != entry->set_bits) 237 + drm_printf(p, "%#8lx & %#10x :: expected %#10x got %#10x\n", 238 + offset, mask, entry->set_bits, val & mask); 239 + } 240 + } 241 + 242 + /** 243 + * xe_reg_sr_lrc_check() - Check LRC for registers referenced in save/restore 244 + * entries and check whether the programming is in place. 245 + * @sr: Save/restore entries 246 + * @gt: GT to read register from 247 + * @hwe: Hardware engine type to check LRC for 248 + * @p: DRM printer to report discrepancies on 249 + */ 250 + void xe_reg_sr_lrc_check(struct xe_reg_sr *sr, 251 + struct xe_gt *gt, 252 + struct xe_hw_engine *hwe, 253 + struct drm_printer *p) 254 + { 255 + struct xe_reg_sr_entry *entry; 256 + unsigned long offset; 257 + 258 + xa_for_each(&sr->xa, offset, entry) { 259 + u32 val; 260 + int ret = xe_lrc_lookup_default_reg_value(gt, hwe->class, offset, &val); 261 + u32 mask = entry->clr_bits | entry->set_bits; 262 + 263 + if (ret == -ENOENT) 264 + drm_printf(p, "%#8lx :: not found in LRC for %s\n", offset, hwe->name); 265 + else if ((val & mask) != entry->set_bits) 266 + drm_printf(p, "%#8lx & %#10x :: expected %#10x got %#10x\n", 267 + offset, mask, entry->set_bits, val & mask); 268 + } 213 269 }
+7
drivers/gpu/drm/xe/xe_reg_sr.h
··· 19 19 20 20 int xe_reg_sr_init(struct xe_reg_sr *sr, const char *name, struct xe_device *xe); 21 21 void xe_reg_sr_dump(struct xe_reg_sr *sr, struct drm_printer *p); 22 + void xe_reg_sr_readback_check(struct xe_reg_sr *sr, 23 + struct xe_gt *gt, 24 + struct drm_printer *p); 25 + void xe_reg_sr_lrc_check(struct xe_reg_sr *sr, 26 + struct xe_gt *gt, 27 + struct xe_hw_engine *hwe, 28 + struct drm_printer *p); 22 29 23 30 int xe_reg_sr_add(struct xe_reg_sr *sr, const struct xe_reg_sr_entry *e, 24 31 struct xe_gt *gt);
+10 -2
drivers/gpu/drm/xe/xe_reg_whitelist.c
··· 75 75 XE_RTP_ACTIONS(WHITELIST(CSBE_DEBUG_STATUS(RENDER_RING_BASE), 0)) 76 76 }, 77 77 { XE_RTP_NAME("14024997852"), 78 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3005), ENGINE_CLASS(RENDER)), 78 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 3005), ENGINE_CLASS(RENDER)), 79 + XE_RTP_ACTIONS(WHITELIST(FF_MODE, 80 + RING_FORCE_TO_NONPRIV_ACCESS_RW), 81 + WHITELIST(VFLSKPD, 82 + RING_FORCE_TO_NONPRIV_ACCESS_RW)) 83 + }, 84 + { XE_RTP_NAME("14024997852"), 85 + XE_RTP_RULES(GRAPHICS_VERSION(3510), GRAPHICS_STEP(A0, B0), 86 + ENGINE_CLASS(RENDER)), 79 87 XE_RTP_ACTIONS(WHITELIST(FF_MODE, 80 88 RING_FORCE_TO_NONPRIV_ACCESS_RW), 81 89 WHITELIST(VFLSKPD, ··· 189 181 struct xe_rtp_process_ctx ctx = XE_RTP_PROCESS_CTX_INITIALIZER(hwe); 190 182 191 183 xe_rtp_process_to_sr(&ctx, register_whitelist, ARRAY_SIZE(register_whitelist), 192 - &hwe->reg_whitelist); 184 + &hwe->reg_whitelist, false); 193 185 whitelist_apply_to_hwe(hwe); 194 186 } 195 187
+9
drivers/gpu/drm/xe/xe_ring_ops.c
··· 280 280 281 281 i = emit_bb_start(batch_addr, ppgtt_flag, dw, i); 282 282 283 + /* Don't preempt fence signaling */ 284 + dw[i++] = MI_ARB_ON_OFF | MI_ARB_DISABLE; 285 + 283 286 if (job->user_fence.used) { 284 287 i = emit_flush_dw(dw, i); 285 288 i = emit_store_imm_ppgtt_posted(job->user_fence.addr, ··· 348 345 349 346 i = emit_bb_start(batch_addr, ppgtt_flag, dw, i); 350 347 348 + /* Don't preempt fence signaling */ 349 + dw[i++] = MI_ARB_ON_OFF | MI_ARB_DISABLE; 350 + 351 351 if (job->user_fence.used) { 352 352 i = emit_flush_dw(dw, i); 353 353 i = emit_store_imm_ppgtt_posted(job->user_fence.addr, ··· 402 396 seqno, dw, i); 403 397 404 398 i = emit_bb_start(batch_addr, ppgtt_flag, dw, i); 399 + 400 + /* Don't preempt fence signaling */ 401 + dw[i++] = MI_ARB_ON_OFF | MI_ARB_DISABLE; 405 402 406 403 i = emit_render_cache_flush(job, dw, i); 407 404
+7 -1
drivers/gpu/drm/xe/xe_rtp.c
··· 270 270 * @sr: Save-restore struct where matching rules execute the action. This can be 271 271 * viewed as the "coalesced view" of multiple the tables. The bits for each 272 272 * register set are expected not to collide with previously added entries 273 + * @process_in_vf: Whether this RTP table should get processed for SR-IOV VF 274 + * devices. Should generally only be 'true' for LRC tables. 273 275 * 274 276 * Walk the table pointed by @entries (with an empty sentinel) and add all 275 277 * entries with matching rules to @sr. If @hwe is not NULL, its mmio_base is ··· 280 278 void xe_rtp_process_to_sr(struct xe_rtp_process_ctx *ctx, 281 279 const struct xe_rtp_entry_sr *entries, 282 280 size_t n_entries, 283 - struct xe_reg_sr *sr) 281 + struct xe_reg_sr *sr, 282 + bool process_in_vf) 284 283 { 285 284 const struct xe_rtp_entry_sr *entry; 286 285 struct xe_hw_engine *hwe = NULL; ··· 289 286 struct xe_device *xe = NULL; 290 287 291 288 rtp_get_context(ctx, &hwe, &gt, &xe); 289 + 290 + if (!process_in_vf && IS_SRIOV_VF(xe)) 291 + return; 292 292 293 293 xe_assert(xe, entries); 294 294
+2 -1
drivers/gpu/drm/xe/xe_rtp.h
··· 431 431 432 432 void xe_rtp_process_to_sr(struct xe_rtp_process_ctx *ctx, 433 433 const struct xe_rtp_entry_sr *entries, 434 - size_t n_entries, struct xe_reg_sr *sr); 434 + size_t n_entries, struct xe_reg_sr *sr, 435 + bool process_in_vf); 435 436 436 437 void xe_rtp_process(struct xe_rtp_process_ctx *ctx, 437 438 const struct xe_rtp_entry *entries);
+36
drivers/gpu/drm/xe/xe_sa.c
··· 89 89 if (ret) 90 90 return ERR_PTR(ret); 91 91 92 + if (IS_ENABLED(CONFIG_PROVE_LOCKING)) { 93 + fs_reclaim_acquire(GFP_KERNEL); 94 + might_lock(&sa_manager->swap_guard); 95 + fs_reclaim_release(GFP_KERNEL); 96 + } 97 + 92 98 shadow = xe_managed_bo_create_pin_map(xe, tile, size, 93 99 XE_BO_FLAG_VRAM_IF_DGFX(tile) | 94 100 XE_BO_FLAG_GGTT | ··· 179 173 return ERR_PTR(-ENOBUFS); 180 174 181 175 return drm_suballoc_new(&sa_manager->base, size, gfp, true, 0); 176 + } 177 + 178 + /** 179 + * xe_sa_bo_alloc() - Allocate uninitialized suballoc object. 180 + * @gfp: gfp flags used for memory allocation. 181 + * 182 + * Allocate memory for an uninitialized suballoc object. Intended usage is 183 + * allocate memory for suballoc object outside of a reclaim tainted context 184 + * and then be initialized at a later time in a reclaim tainted context. 185 + * 186 + * Return: a new uninitialized suballoc object, or an ERR_PTR(-ENOMEM). 187 + */ 188 + struct drm_suballoc *xe_sa_bo_alloc(gfp_t gfp) 189 + { 190 + return drm_suballoc_alloc(gfp); 191 + } 192 + 193 + /** 194 + * xe_sa_bo_init() - Initialize a suballocation. 195 + * @sa_manager: pointer to the sa_manager 196 + * @sa: The struct drm_suballoc. 197 + * @size: number of bytes we want to suballocate. 198 + * 199 + * Try to make a suballocation on a pre-allocated suballoc object of size @size. 200 + * 201 + * Return: zero on success, errno on failure. 202 + */ 203 + int xe_sa_bo_init(struct xe_sa_manager *sa_manager, struct drm_suballoc *sa, size_t size) 204 + { 205 + return drm_suballoc_insert(&sa_manager->base, sa, size, true, 0); 182 206 } 183 207 184 208 /**
+2
drivers/gpu/drm/xe/xe_sa.h
··· 38 38 return __xe_sa_bo_new(sa_manager, size, GFP_KERNEL); 39 39 } 40 40 41 + struct drm_suballoc *xe_sa_bo_alloc(gfp_t gfp); 42 + int xe_sa_bo_init(struct xe_sa_manager *sa_manager, struct drm_suballoc *sa, size_t size); 41 43 void xe_sa_bo_flush_write(struct drm_suballoc *sa_bo); 42 44 void xe_sa_bo_sync_read(struct drm_suballoc *sa_bo); 43 45 void xe_sa_bo_free(struct drm_suballoc *sa_bo, struct dma_fence *fence);
+57
drivers/gpu/drm/xe/xe_sleep.h
··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2026 Intel Corporation 4 + */ 5 + 6 + #ifndef _XE_SLEEP_H_ 7 + #define _XE_SLEEP_H_ 8 + 9 + #include <linux/delay.h> 10 + #include <linux/math64.h> 11 + 12 + /** 13 + * xe_sleep_relaxed_ms() - Sleep for an approximate time. 14 + * @delay_ms: time in msec to sleep 15 + * 16 + * For smaller timeouts, sleep with 0.5ms accuracy. 17 + */ 18 + static inline void xe_sleep_relaxed_ms(unsigned int delay_ms) 19 + { 20 + unsigned long min_us, max_us; 21 + 22 + if (!delay_ms) 23 + return; 24 + 25 + if (delay_ms > 20) { 26 + msleep(delay_ms); 27 + return; 28 + } 29 + 30 + min_us = mul_u32_u32(delay_ms, 1000); 31 + max_us = min_us + 500; 32 + 33 + usleep_range(min_us, max_us); 34 + } 35 + 36 + /** 37 + * xe_sleep_exponential_ms() - Sleep for a exponentially increased time. 38 + * @sleep_period_ms: current time in msec to sleep 39 + * @max_sleep_ms: maximum time in msec to sleep 40 + * 41 + * Sleep for the @sleep_period_ms and exponentially increase this time for the 42 + * next loop, unless reaching the @max_sleep_ms limit. 43 + * 44 + * Return: approximate time in msec the task was delayed. 45 + */ 46 + static inline unsigned int xe_sleep_exponential_ms(unsigned int *sleep_period_ms, 47 + unsigned int max_sleep_ms) 48 + { 49 + unsigned int delay_ms = *sleep_period_ms; 50 + unsigned int next_delay_ms = 2 * delay_ms; 51 + 52 + xe_sleep_relaxed_ms(delay_ms); 53 + *sleep_period_ms = min(next_delay_ms, max_sleep_ms); 54 + return delay_ms; 55 + } 56 + 57 + #endif
+1
drivers/gpu/drm/xe/xe_soc_remapper.c
··· 4 4 */ 5 5 6 6 #include "regs/xe_soc_remapper_regs.h" 7 + #include "xe_device.h" 7 8 #include "xe_mmio.h" 8 9 #include "xe_soc_remapper.h" 9 10
+1 -1
drivers/gpu/drm/xe/xe_sriov.c
··· 120 120 xe_sriov_vf_init_early(xe); 121 121 122 122 xe_assert(xe, !xe->sriov.wq); 123 - xe->sriov.wq = alloc_workqueue("xe-sriov-wq", 0, 0); 123 + xe->sriov.wq = alloc_workqueue("xe-sriov-wq", WQ_PERCPU, 0); 124 124 if (!xe->sriov.wq) 125 125 return -ENOMEM; 126 126
+2 -5
drivers/gpu/drm/xe/xe_sriov.h
··· 28 28 29 29 static inline bool xe_device_is_sriov_pf(const struct xe_device *xe) 30 30 { 31 - return xe_device_sriov_mode(xe) == XE_SRIOV_MODE_PF; 31 + return IS_ENABLED(CONFIG_PCI_IOV) && 32 + xe_device_sriov_mode(xe) == XE_SRIOV_MODE_PF; 32 33 } 33 34 34 35 static inline bool xe_device_is_sriov_vf(const struct xe_device *xe) ··· 37 36 return xe_device_sriov_mode(xe) == XE_SRIOV_MODE_VF; 38 37 } 39 38 40 - #ifdef CONFIG_PCI_IOV 41 39 #define IS_SRIOV_PF(xe) xe_device_is_sriov_pf(xe) 42 - #else 43 - #define IS_SRIOV_PF(xe) (typecheck(struct xe_device *, (xe)) && false) 44 - #endif 45 40 #define IS_SRIOV_VF(xe) xe_device_is_sriov_vf(xe) 46 41 47 42 #define IS_SRIOV(xe) (IS_SRIOV_PF(xe) || IS_SRIOV_VF(xe))
+7 -3
drivers/gpu/drm/xe/xe_sriov_pf.c
··· 20 20 #include "xe_sriov_pf_sysfs.h" 21 21 #include "xe_sriov_printk.h" 22 22 23 + static bool wanted_admin_only(struct xe_device *xe) 24 + { 25 + return xe_configfs_admin_only_pf(to_pci_dev(xe->drm.dev)); 26 + } 27 + 23 28 static unsigned int wanted_max_vfs(struct xe_device *xe) 24 29 { 25 - if (IS_ENABLED(CONFIG_CONFIGFS_FS)) 26 - return xe_configfs_get_max_vfs(to_pci_dev(xe->drm.dev)); 27 - return xe_modparam.max_vfs; 30 + return xe_configfs_get_max_vfs(to_pci_dev(xe->drm.dev)); 28 31 } 29 32 30 33 static int pf_reduce_totalvfs(struct xe_device *xe, int limit) ··· 79 76 80 77 pf_reduce_totalvfs(xe, newlimit); 81 78 79 + xe->sriov.pf.admin_only = wanted_admin_only(xe); 82 80 xe->sriov.pf.device_total_vfs = totalvfs; 83 81 xe->sriov.pf.driver_max_vfs = newlimit; 84 82
+2 -1
drivers/gpu/drm/xe/xe_sriov_pf_helpers.h
··· 56 56 */ 57 57 static inline bool xe_sriov_pf_admin_only(const struct xe_device *xe) 58 58 { 59 - return !xe->info.probe_display; 59 + xe_assert(xe, IS_SRIOV_PF(xe)); 60 + return xe->sriov.pf.admin_only; 60 61 } 61 62 62 63 static inline struct mutex *xe_sriov_pf_master_mutex(struct xe_device *xe)
+106 -13
drivers/gpu/drm/xe/xe_sriov_pf_provision.c
··· 7 7 #include "xe_device.h" 8 8 #include "xe_gt_sriov_pf_config.h" 9 9 #include "xe_gt_sriov_pf_policy.h" 10 + #include "xe_lmtt.h" 10 11 #include "xe_sriov.h" 11 12 #include "xe_sriov_pf_helpers.h" 12 13 #include "xe_sriov_pf_provision.h" ··· 33 32 return xe->sriov.pf.provision.mode == XE_SRIOV_PROVISIONING_MODE_AUTO; 34 33 } 35 34 36 - static bool pf_needs_provisioning(struct xe_gt *gt, unsigned int num_vfs) 37 - { 38 - unsigned int n; 39 - 40 - for (n = 1; n <= num_vfs; n++) 41 - if (!xe_gt_sriov_pf_config_is_empty(gt, n)) 42 - return false; 43 - 44 - return true; 45 - } 46 - 47 35 static int pf_provision_vfs(struct xe_device *xe, unsigned int num_vfs) 48 36 { 49 37 struct xe_gt *gt; ··· 41 51 int err; 42 52 43 53 for_each_gt(gt, xe, id) { 44 - if (!pf_needs_provisioning(gt, num_vfs)) 45 - return -EUCLEAN; 46 54 err = xe_gt_sriov_pf_config_set_fair(gt, VFID(1), num_vfs); 47 55 result = result ?: err; 48 56 } ··· 423 435 } 424 436 425 437 return !count ? -ENODATA : 0; 438 + } 439 + 440 + static u64 vram_per_tile(struct xe_tile *tile, u64 total) 441 + { 442 + struct xe_device *xe = tile->xe; 443 + unsigned int tcount = xe->info.tile_count; 444 + u64 alignment = xe_lmtt_page_size(&tile->sriov.pf.lmtt); 445 + 446 + total = round_up(total, tcount * alignment); 447 + return div_u64(total, tcount); 448 + } 449 + 450 + /** 451 + * xe_sriov_pf_provision_bulk_apply_vram() - Change VRAM provisioning for all VFs. 452 + * @xe: the PF &xe_device 453 + * @size: the VRAM size in [bytes] to set 454 + * 455 + * Change all VFs VRAM (LMEM) provisioning on all tiles. 456 + * 457 + * This function can only be called on PF. 458 + * 459 + * Return: 0 on success or a negative error code on failure. 460 + */ 461 + int xe_sriov_pf_provision_bulk_apply_vram(struct xe_device *xe, u64 size) 462 + { 463 + unsigned int num_vfs = xe_sriov_pf_get_totalvfs(xe); 464 + struct xe_tile *tile; 465 + unsigned int id; 466 + int result = 0; 467 + int err; 468 + 469 + xe_assert(xe, xe_device_has_lmtt(xe)); 470 + 471 + guard(mutex)(xe_sriov_pf_master_mutex(xe)); 472 + 473 + for_each_tile(tile, xe, id) { 474 + err = xe_gt_sriov_pf_config_bulk_set_lmem_locked(tile->primary_gt, 475 + VFID(1), num_vfs, 476 + vram_per_tile(tile, size)); 477 + result = result ?: err; 478 + } 479 + 480 + return result; 481 + } 482 + 483 + /** 484 + * xe_sriov_pf_provision_apply_vf_vram() - Change single VF VRAM allocation. 485 + * @xe: the PF &xe_device 486 + * @vfid: the VF identifier (can't be 0 == PFID) 487 + * @size: VRAM size to set 488 + * 489 + * Change VF's VRAM provisioning on all tiles/GTs. 490 + * 491 + * This function can only be called on PF. 492 + * 493 + * Return: 0 on success or a negative error code on failure. 494 + */ 495 + int xe_sriov_pf_provision_apply_vf_vram(struct xe_device *xe, unsigned int vfid, u64 size) 496 + { 497 + struct xe_tile *tile; 498 + unsigned int id; 499 + int result = 0; 500 + int err; 501 + 502 + xe_assert(xe, vfid); 503 + xe_assert(xe, xe_device_has_lmtt(xe)); 504 + 505 + guard(mutex)(xe_sriov_pf_master_mutex(xe)); 506 + 507 + for_each_tile(tile, xe, id) { 508 + err = xe_gt_sriov_pf_config_set_lmem_locked(tile->primary_gt, vfid, 509 + vram_per_tile(tile, size)); 510 + result = result ?: err; 511 + } 512 + 513 + return result; 514 + } 515 + 516 + /** 517 + * xe_sriov_pf_provision_query_vf_vram() - Query VF's VRAM allocation. 518 + * @xe: the PF &xe_device 519 + * @vfid: the VF identifier (can't be 0 == PFID) 520 + * @size: placeholder for the returned VRAM size 521 + * 522 + * Query VF's VRAM provisioning from all tiles/GTs. 523 + * 524 + * This function can only be called on PF. 525 + * 526 + * Return: 0 on success or a negative error code on failure. 527 + */ 528 + int xe_sriov_pf_provision_query_vf_vram(struct xe_device *xe, unsigned int vfid, u64 *size) 529 + { 530 + struct xe_tile *tile; 531 + unsigned int id; 532 + u64 total = 0; 533 + 534 + xe_assert(xe, vfid); 535 + 536 + guard(mutex)(xe_sriov_pf_master_mutex(xe)); 537 + 538 + for_each_tile(tile, xe, id) 539 + total += xe_gt_sriov_pf_config_get_lmem_locked(tile->primary_gt, vfid); 540 + 541 + *size = total; 542 + return 0; 426 543 }
+4
drivers/gpu/drm/xe/xe_sriov_pf_provision.h
··· 24 24 int xe_sriov_pf_provision_apply_vf_priority(struct xe_device *xe, unsigned int vfid, u32 prio); 25 25 int xe_sriov_pf_provision_query_vf_priority(struct xe_device *xe, unsigned int vfid, u32 *prio); 26 26 27 + int xe_sriov_pf_provision_bulk_apply_vram(struct xe_device *xe, u64 size); 28 + int xe_sriov_pf_provision_apply_vf_vram(struct xe_device *xe, unsigned int vfid, u64 size); 29 + int xe_sriov_pf_provision_query_vf_vram(struct xe_device *xe, unsigned int vfid, u64 *size); 30 + 27 31 int xe_sriov_pf_provision_vfs(struct xe_device *xe, unsigned int num_vfs); 28 32 int xe_sriov_pf_unprovision_vfs(struct xe_device *xe, unsigned int num_vfs); 29 33
+29 -2
drivers/gpu/drm/xe/xe_sriov_pf_sysfs.c
··· 9 9 #include <drm/drm_managed.h> 10 10 11 11 #include "xe_assert.h" 12 + #include "xe_device.h" 12 13 #include "xe_pci_sriov.h" 13 14 #include "xe_pm.h" 14 15 #include "xe_sriov.h" ··· 45 44 * ├── .bulk_profile 46 45 * │ ├── exec_quantum_ms 47 46 * │ ├── preempt_timeout_us 48 - * │ └── sched_priority 47 + * │ ├── sched_priority 48 + * │ └── vram_quota 49 49 * ├── pf/ 50 50 * │ ├── ... 51 51 * │ ├── device -> ../../../BDF ··· 61 59 * │ └── profile 62 60 * │ ├── exec_quantum_ms 63 61 * │ ├── preempt_timeout_us 64 - * │ └── sched_priority 62 + * │ ├── sched_priority 63 + * │ └── vram_quota 65 64 * ├── vf2/ 66 65 * : 67 66 * └── vfN/ ··· 135 132 136 133 DEFINE_SIMPLE_BULK_PROVISIONING_SRIOV_DEV_ATTR_WO(exec_quantum_ms, eq, u32); 137 134 DEFINE_SIMPLE_BULK_PROVISIONING_SRIOV_DEV_ATTR_WO(preempt_timeout_us, pt, u32); 135 + DEFINE_SIMPLE_BULK_PROVISIONING_SRIOV_DEV_ATTR_WO(vram_quota, vram, u64); 138 136 139 137 static const char * const sched_priority_names[] = { 140 138 [GUC_SCHED_PRIORITY_LOW] = "low", ··· 185 181 &xe_sriov_dev_attr_exec_quantum_ms.attr, 186 182 &xe_sriov_dev_attr_preempt_timeout_us.attr, 187 183 &xe_sriov_dev_attr_sched_priority.attr, 184 + &xe_sriov_dev_attr_vram_quota.attr, 188 185 NULL 189 186 }; 187 + 188 + static umode_t profile_dev_attr_is_visible(struct kobject *kobj, 189 + struct attribute *attr, int index) 190 + { 191 + struct xe_sriov_kobj *vkobj = to_xe_sriov_kobj(kobj); 192 + 193 + if (attr == &xe_sriov_dev_attr_vram_quota.attr && 194 + !xe_device_has_lmtt(vkobj->xe)) 195 + return 0; 196 + 197 + return attr->mode; 198 + } 190 199 191 200 static const struct attribute_group bulk_profile_dev_attr_group = { 192 201 .name = ".bulk_profile", 193 202 .attrs = bulk_profile_dev_attrs, 203 + .is_visible = profile_dev_attr_is_visible, 194 204 }; 195 205 196 206 static const struct attribute_group *xe_sriov_dev_attr_groups[] = { ··· 246 228 247 229 DEFINE_SIMPLE_PROVISIONING_SRIOV_VF_ATTR(exec_quantum_ms, eq, u32, "%u\n"); 248 230 DEFINE_SIMPLE_PROVISIONING_SRIOV_VF_ATTR(preempt_timeout_us, pt, u32, "%u\n"); 231 + DEFINE_SIMPLE_PROVISIONING_SRIOV_VF_ATTR(vram_quota, vram, u64, "%llu\n"); 249 232 250 233 static ssize_t xe_sriov_vf_attr_sched_priority_show(struct xe_device *xe, unsigned int vfid, 251 234 char *buf) ··· 293 274 &xe_sriov_vf_attr_exec_quantum_ms.attr, 294 275 &xe_sriov_vf_attr_preempt_timeout_us.attr, 295 276 &xe_sriov_vf_attr_sched_priority.attr, 277 + &xe_sriov_vf_attr_vram_quota.attr, 296 278 NULL 297 279 }; 298 280 ··· 305 285 if (attr == &xe_sriov_vf_attr_sched_priority.attr && 306 286 !sched_priority_change_allowed(vkobj->vfid)) 307 287 return attr->mode & 0444; 288 + 289 + if (attr == &xe_sriov_vf_attr_vram_quota.attr) { 290 + if (!IS_DGFX(vkobj->xe) || vkobj->vfid == PFID) 291 + return 0; 292 + if (!xe_device_has_lmtt(vkobj->xe)) 293 + return attr->mode & 0444; 294 + } 308 295 309 296 return attr->mode; 310 297 }
+3
drivers/gpu/drm/xe/xe_sriov_pf_types.h
··· 36 36 * @XE_SRIOV_MODE_PF mode. 37 37 */ 38 38 struct xe_device_pf { 39 + /** @admin_only: PF functionality focused on VFs management only. */ 40 + bool admin_only; 41 + 39 42 /** @device_total_vfs: Maximum number of VFs supported by the device. */ 40 43 u16 device_total_vfs; 41 44
+2 -2
drivers/gpu/drm/xe/xe_tile.h
··· 6 6 #ifndef _XE_TILE_H_ 7 7 #define _XE_TILE_H_ 8 8 9 - #include "xe_device_types.h" 9 + #include "xe_tile_types.h" 10 10 11 + struct xe_device; 11 12 struct xe_pagemap; 12 - struct xe_tile; 13 13 14 14 int xe_tile_init_early(struct xe_tile *tile, struct xe_device *xe, u8 id); 15 15 int xe_tile_init_noalloc(struct xe_tile *tile);
+4 -194
drivers/gpu/drm/xe/xe_tile_sriov_vf.c
··· 14 14 #include "xe_tile_sriov_vf.h" 15 15 #include "xe_wopcm.h" 16 16 17 - static int vf_init_ggtt_balloons(struct xe_tile *tile) 18 - { 19 - struct xe_ggtt *ggtt = tile->mem.ggtt; 20 - 21 - xe_tile_assert(tile, IS_SRIOV_VF(tile_to_xe(tile))); 22 - 23 - tile->sriov.vf.ggtt_balloon[0] = xe_ggtt_node_init(ggtt); 24 - if (IS_ERR(tile->sriov.vf.ggtt_balloon[0])) 25 - return PTR_ERR(tile->sriov.vf.ggtt_balloon[0]); 26 - 27 - tile->sriov.vf.ggtt_balloon[1] = xe_ggtt_node_init(ggtt); 28 - if (IS_ERR(tile->sriov.vf.ggtt_balloon[1])) { 29 - xe_ggtt_node_fini(tile->sriov.vf.ggtt_balloon[0]); 30 - return PTR_ERR(tile->sriov.vf.ggtt_balloon[1]); 31 - } 32 - 33 - return 0; 34 - } 35 - 36 - /** 37 - * xe_tile_sriov_vf_balloon_ggtt_locked - Insert balloon nodes to limit used GGTT address range. 38 - * @tile: the &xe_tile struct instance 39 - * 40 - * Return: 0 on success or a negative error code on failure. 41 - */ 42 - static int xe_tile_sriov_vf_balloon_ggtt_locked(struct xe_tile *tile) 43 - { 44 - u64 ggtt_base = tile->sriov.vf.self_config.ggtt_base; 45 - u64 ggtt_size = tile->sriov.vf.self_config.ggtt_size; 46 - struct xe_device *xe = tile_to_xe(tile); 47 - u64 wopcm = xe_wopcm_size(xe); 48 - u64 start, end; 49 - int err; 50 - 51 - xe_tile_assert(tile, IS_SRIOV_VF(xe)); 52 - xe_tile_assert(tile, ggtt_size); 53 - lockdep_assert_held(&tile->mem.ggtt->lock); 54 - 55 - /* 56 - * VF can only use part of the GGTT as allocated by the PF: 57 - * 58 - * WOPCM GUC_GGTT_TOP 59 - * |<------------ Total GGTT size ------------------>| 60 - * 61 - * VF GGTT base -->|<- size ->| 62 - * 63 - * +--------------------+----------+-----------------+ 64 - * |////////////////////| block |\\\\\\\\\\\\\\\\\| 65 - * +--------------------+----------+-----------------+ 66 - * 67 - * |<--- balloon[0] --->|<-- VF -->|<-- balloon[1] ->| 68 - */ 69 - 70 - if (ggtt_base < wopcm || ggtt_base > GUC_GGTT_TOP || 71 - ggtt_size > GUC_GGTT_TOP - ggtt_base) { 72 - xe_sriov_err(xe, "tile%u: Invalid GGTT configuration: %#llx-%#llx\n", 73 - tile->id, ggtt_base, ggtt_base + ggtt_size - 1); 74 - return -ERANGE; 75 - } 76 - 77 - start = wopcm; 78 - end = ggtt_base; 79 - if (end != start) { 80 - err = xe_ggtt_node_insert_balloon_locked(tile->sriov.vf.ggtt_balloon[0], 81 - start, end); 82 - if (err) 83 - return err; 84 - } 85 - 86 - start = ggtt_base + ggtt_size; 87 - end = GUC_GGTT_TOP; 88 - if (end != start) { 89 - err = xe_ggtt_node_insert_balloon_locked(tile->sriov.vf.ggtt_balloon[1], 90 - start, end); 91 - if (err) { 92 - xe_ggtt_node_remove_balloon_locked(tile->sriov.vf.ggtt_balloon[0]); 93 - return err; 94 - } 95 - } 96 - 97 - return 0; 98 - } 99 - 100 - static int vf_balloon_ggtt(struct xe_tile *tile) 101 - { 102 - struct xe_ggtt *ggtt = tile->mem.ggtt; 103 - int err; 104 - 105 - mutex_lock(&ggtt->lock); 106 - err = xe_tile_sriov_vf_balloon_ggtt_locked(tile); 107 - mutex_unlock(&ggtt->lock); 108 - 109 - return err; 110 - } 111 - 112 - /** 113 - * xe_tile_sriov_vf_deballoon_ggtt_locked - Remove balloon nodes. 114 - * @tile: the &xe_tile struct instance 115 - */ 116 - void xe_tile_sriov_vf_deballoon_ggtt_locked(struct xe_tile *tile) 117 - { 118 - xe_tile_assert(tile, IS_SRIOV_VF(tile_to_xe(tile))); 119 - 120 - xe_ggtt_node_remove_balloon_locked(tile->sriov.vf.ggtt_balloon[1]); 121 - xe_ggtt_node_remove_balloon_locked(tile->sriov.vf.ggtt_balloon[0]); 122 - } 123 - 124 - static void vf_deballoon_ggtt(struct xe_tile *tile) 125 - { 126 - mutex_lock(&tile->mem.ggtt->lock); 127 - xe_tile_sriov_vf_deballoon_ggtt_locked(tile); 128 - mutex_unlock(&tile->mem.ggtt->lock); 129 - } 130 - 131 - static void vf_fini_ggtt_balloons(struct xe_tile *tile) 132 - { 133 - xe_tile_assert(tile, IS_SRIOV_VF(tile_to_xe(tile))); 134 - 135 - xe_ggtt_node_fini(tile->sriov.vf.ggtt_balloon[1]); 136 - xe_ggtt_node_fini(tile->sriov.vf.ggtt_balloon[0]); 137 - } 138 - 139 - static void cleanup_ggtt(struct drm_device *drm, void *arg) 140 - { 141 - struct xe_tile *tile = arg; 142 - 143 - vf_deballoon_ggtt(tile); 144 - vf_fini_ggtt_balloons(tile); 145 - } 146 - 147 - /** 148 - * xe_tile_sriov_vf_prepare_ggtt - Prepare a VF's GGTT configuration. 149 - * @tile: the &xe_tile 150 - * 151 - * This function is for VF use only. 152 - * 153 - * Return: 0 on success or a negative error code on failure. 154 - */ 155 - int xe_tile_sriov_vf_prepare_ggtt(struct xe_tile *tile) 156 - { 157 - struct xe_device *xe = tile_to_xe(tile); 158 - int err; 159 - 160 - err = vf_init_ggtt_balloons(tile); 161 - if (err) 162 - return err; 163 - 164 - err = vf_balloon_ggtt(tile); 165 - if (err) { 166 - vf_fini_ggtt_balloons(tile); 167 - return err; 168 - } 169 - 170 - return drmm_add_action_or_reset(&xe->drm, cleanup_ggtt, tile); 171 - } 172 - 173 17 /** 174 18 * DOC: GGTT nodes shifting during VF post-migration recovery 175 19 * 176 20 * The first fixup applied to the VF KMD structures as part of post-migration 177 21 * recovery is shifting nodes within &xe_ggtt instance. The nodes are moved 178 22 * from range previously assigned to this VF, into newly provisioned area. 179 - * The changes include balloons, which are resized accordingly. 180 - * 181 - * The balloon nodes are there to eliminate unavailable ranges from use: one 182 - * reserves the GGTT area below the range for current VF, and another one 183 - * reserves area above. 184 23 * 185 24 * Below is a GGTT layout of example VF, with a certain address range assigned to 186 25 * said VF, and inaccessible areas above and below: ··· 36 197 * Hardware enforced access rules before migration: 37 198 * 38 199 * |<------- inaccessible for VF ------->|<VF owned>|<-- inaccessible for VF ->| 39 - * 40 - * GGTT nodes used for tracking allocations: 41 - * 42 - * |<---------- balloon ------------>|<- nodes->|<----- balloon ------>| 43 200 * 44 201 * After the migration, GGTT area assigned to the VF might have shifted, either 45 202 * to lower or to higher address. But we expect the total size and extra areas to ··· 54 219 * So the VF has a new slice of GGTT assigned, and during migration process, the 55 220 * memory content was copied to that new area. But the &xe_ggtt nodes are still 56 221 * tracking allocations using the old addresses. The nodes within VF owned area 57 - * have to be shifted, and balloon nodes need to be resized to properly mask out 58 - * areas not owned by the VF. 222 + * have to be shifted, and the start offset for GGTT adjusted. 59 223 * 60 - * Fixed &xe_ggtt nodes used for tracking allocations: 61 - * 62 - * |<------ balloon ------>|<- nodes->|<----------- balloon ----------->| 63 - * 64 - * Due to use of GPU profiles, we do not expect the old and new GGTT ares to 224 + * Due to use of GPU profiles, we do not expect the old and new GGTT areas to 65 225 * overlap; but our node shifting will fix addresses properly regardless. 66 226 */ 67 - 68 - /** 69 - * xe_tile_sriov_vf_fixup_ggtt_nodes_locked - Shift GGTT allocations to match assigned range. 70 - * @tile: the &xe_tile struct instance 71 - * @shift: the shift value 72 - * 73 - * Since Global GTT is not virtualized, each VF has an assigned range 74 - * within the global space. This range might have changed during migration, 75 - * which requires all memory addresses pointing to GGTT to be shifted. 76 - */ 77 - void xe_tile_sriov_vf_fixup_ggtt_nodes_locked(struct xe_tile *tile, s64 shift) 78 - { 79 - struct xe_ggtt *ggtt = tile->mem.ggtt; 80 - 81 - lockdep_assert_held(&ggtt->lock); 82 - 83 - xe_tile_sriov_vf_deballoon_ggtt_locked(tile); 84 - xe_ggtt_shift_nodes_locked(ggtt, shift); 85 - xe_tile_sriov_vf_balloon_ggtt_locked(tile); 86 - } 87 227 88 228 /** 89 229 * xe_tile_sriov_vf_lmem - VF LMEM configuration. ··· 140 330 141 331 xe_tile_assert(tile, IS_SRIOV_VF(tile_to_xe(tile))); 142 332 143 - return config->ggtt_base; 333 + return READ_ONCE(config->ggtt_base); 144 334 } 145 335 146 336 /** ··· 156 346 157 347 xe_tile_assert(tile, IS_SRIOV_VF(tile_to_xe(tile))); 158 348 159 - config->ggtt_base = ggtt_base; 349 + WRITE_ONCE(config->ggtt_base, ggtt_base); 160 350 }
-3
drivers/gpu/drm/xe/xe_tile_sriov_vf.h
··· 10 10 11 11 struct xe_tile; 12 12 13 - int xe_tile_sriov_vf_prepare_ggtt(struct xe_tile *tile); 14 - void xe_tile_sriov_vf_deballoon_ggtt_locked(struct xe_tile *tile); 15 - void xe_tile_sriov_vf_fixup_ggtt_nodes_locked(struct xe_tile *tile, s64 shift); 16 13 u64 xe_tile_sriov_vf_ggtt(struct xe_tile *tile); 17 14 void xe_tile_sriov_vf_ggtt_store(struct xe_tile *tile, u64 ggtt_size); 18 15 u64 xe_tile_sriov_vf_ggtt_base(struct xe_tile *tile);
+1 -1
drivers/gpu/drm/xe/xe_tile_sysfs.c
··· 7 7 #include <linux/sysfs.h> 8 8 #include <drm/drm_managed.h> 9 9 10 + #include "xe_device_types.h" 10 11 #include "xe_pm.h" 11 - #include "xe_tile.h" 12 12 #include "xe_tile_sysfs.h" 13 13 #include "xe_vram_freq.h" 14 14
+141
drivers/gpu/drm/xe/xe_tile_types.h
··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2022-2026 Intel Corporation 4 + */ 5 + 6 + #ifndef _XE_TILE_TYPES_H_ 7 + #define _XE_TILE_TYPES_H_ 8 + 9 + #include <linux/mutex_types.h> 10 + #include <linux/workqueue_types.h> 11 + 12 + #include "xe_lmtt_types.h" 13 + #include "xe_memirq_types.h" 14 + #include "xe_mert.h" 15 + #include "xe_mmio_types.h" 16 + #include "xe_tile_sriov_vf_types.h" 17 + 18 + #define tile_to_xe(tile__) \ 19 + _Generic(tile__, \ 20 + const struct xe_tile * : (const struct xe_device *)((tile__)->xe), \ 21 + struct xe_tile * : (tile__)->xe) 22 + 23 + /** 24 + * struct xe_tile - hardware tile structure 25 + * 26 + * From a driver perspective, a "tile" is effectively a complete GPU, containing 27 + * an SGunit, 1-2 GTs, and (for discrete platforms) VRAM. 28 + * 29 + * Multi-tile platforms effectively bundle multiple GPUs behind a single PCI 30 + * device and designate one "root" tile as being responsible for external PCI 31 + * communication. PCI BAR0 exposes the GGTT and MMIO register space for each 32 + * tile in a stacked layout, and PCI BAR2 exposes the local memory associated 33 + * with each tile similarly. Device-wide interrupts can be enabled/disabled 34 + * at the root tile, and the MSTR_TILE_INTR register will report which tiles 35 + * have interrupts that need servicing. 36 + */ 37 + struct xe_tile { 38 + /** @xe: Backpointer to tile's PCI device */ 39 + struct xe_device *xe; 40 + 41 + /** @id: ID of the tile */ 42 + u8 id; 43 + 44 + /** 45 + * @primary_gt: Primary GT 46 + */ 47 + struct xe_gt *primary_gt; 48 + 49 + /** 50 + * @media_gt: Media GT 51 + * 52 + * Only present on devices with media version >= 13. 53 + */ 54 + struct xe_gt *media_gt; 55 + 56 + /** 57 + * @mmio: MMIO info for a tile. 58 + * 59 + * Each tile has its own 16MB space in BAR0, laid out as: 60 + * * 0-4MB: registers 61 + * * 4MB-8MB: reserved 62 + * * 8MB-16MB: global GTT 63 + */ 64 + struct xe_mmio mmio; 65 + 66 + /** @mem: memory management info for tile */ 67 + struct { 68 + /** 69 + * @mem.kernel_vram: kernel-dedicated VRAM info for tile. 70 + * 71 + * Although VRAM is associated with a specific tile, it can 72 + * still be accessed by all tiles' GTs. 73 + */ 74 + struct xe_vram_region *kernel_vram; 75 + 76 + /** 77 + * @mem.vram: general purpose VRAM info for tile. 78 + * 79 + * Although VRAM is associated with a specific tile, it can 80 + * still be accessed by all tiles' GTs. 81 + */ 82 + struct xe_vram_region *vram; 83 + 84 + /** @mem.ggtt: Global graphics translation table */ 85 + struct xe_ggtt *ggtt; 86 + 87 + /** 88 + * @mem.kernel_bb_pool: Pool from which batchbuffers are allocated. 89 + * 90 + * Media GT shares a pool with its primary GT. 91 + */ 92 + struct xe_sa_manager *kernel_bb_pool; 93 + 94 + /** 95 + * @mem.reclaim_pool: Pool for PRLs allocated. 96 + * 97 + * Only main GT has page reclaim list allocations. 98 + */ 99 + struct xe_sa_manager *reclaim_pool; 100 + } mem; 101 + 102 + /** @sriov: tile level virtualization data */ 103 + union { 104 + struct { 105 + /** @sriov.pf.lmtt: Local Memory Translation Table. */ 106 + struct xe_lmtt lmtt; 107 + } pf; 108 + struct { 109 + /** @sriov.vf.ggtt_balloon: GGTT regions excluded from use. */ 110 + struct xe_ggtt_node *ggtt_balloon[2]; 111 + /** @sriov.vf.self_config: VF configuration data */ 112 + struct xe_tile_sriov_vf_selfconfig self_config; 113 + } vf; 114 + } sriov; 115 + 116 + /** @memirq: Memory Based Interrupts. */ 117 + struct xe_memirq memirq; 118 + 119 + /** @csc_hw_error_work: worker to report CSC HW errors */ 120 + struct work_struct csc_hw_error_work; 121 + 122 + /** @pcode: tile's PCODE */ 123 + struct { 124 + /** @pcode.lock: protecting tile's PCODE mailbox data */ 125 + struct mutex lock; 126 + } pcode; 127 + 128 + /** @migrate: Migration helper for vram blits and clearing */ 129 + struct xe_migrate *migrate; 130 + 131 + /** @sysfs: sysfs' kobj used by xe_tile_sysfs */ 132 + struct kobject *sysfs; 133 + 134 + /** @debugfs: debugfs directory associated with this tile */ 135 + struct dentry *debugfs; 136 + 137 + /** @mert: MERT-related data */ 138 + struct xe_mert mert; 139 + }; 140 + 141 + #endif
+33
drivers/gpu/drm/xe/xe_tlb_inval.c
··· 41 41 static void 42 42 xe_tlb_inval_fence_signal(struct xe_tlb_inval_fence *fence) 43 43 { 44 + struct xe_tlb_inval *tlb_inval = fence->tlb_inval; 44 45 bool stack = test_bit(FENCE_STACK_BIT, &fence->base.flags); 45 46 46 47 lockdep_assert_held(&fence->tlb_inval->pending_lock); 47 48 48 49 list_del(&fence->link); 50 + if (list_empty(&tlb_inval->pending_fences)) 51 + cancel_delayed_work(&tlb_inval->fence_tdr); 49 52 trace_xe_tlb_inval_fence_signal(fence->tlb_inval->xe, fence); 50 53 xe_tlb_inval_fence_fini(fence); 51 54 dma_fence_signal(&fence->base); ··· 114 111 xe_tlb_inval_reset(tlb_inval); 115 112 } 116 113 114 + static void primelockdep(struct xe_tlb_inval *tlb_inval) 115 + { 116 + if (!IS_ENABLED(CONFIG_LOCKDEP)) 117 + return; 118 + 119 + fs_reclaim_acquire(GFP_KERNEL); 120 + might_lock(&tlb_inval->seqno_lock); 121 + fs_reclaim_release(GFP_KERNEL); 122 + } 123 + 117 124 /** 118 125 * xe_gt_tlb_inval_init_early() - Initialize TLB invalidation state 119 126 * @gt: GT structure ··· 149 136 err = drmm_mutex_init(&xe->drm, &tlb_inval->seqno_lock); 150 137 if (err) 151 138 return err; 139 + 140 + primelockdep(tlb_inval); 152 141 153 142 tlb_inval->job_wq = drmm_alloc_ordered_workqueue(&xe->drm, 154 143 "gt-tbl-inval-job-wq", ··· 467 452 else 468 453 dma_fence_get(&fence->base); 469 454 fence->tlb_inval = tlb_inval; 455 + } 456 + 457 + /** 458 + * xe_tlb_inval_idle() - Initialize TLB invalidation is idle 459 + * @tlb_inval: TLB invalidation client 460 + * 461 + * Check the TLB invalidation seqno to determine if it is idle (i.e., no TLB 462 + * invalidations are in flight). Expected to be called in the backend after the 463 + * fence has been added to the pending list, and takes this into account. 464 + * 465 + * Return: True if TLB invalidation client is idle, False otherwise 466 + */ 467 + bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval) 468 + { 469 + lockdep_assert_held(&tlb_inval->seqno_lock); 470 + 471 + guard(spinlock_irq)(&tlb_inval->pending_lock); 472 + return list_is_singular(&tlb_inval->pending_fences); 470 473 }
+2
drivers/gpu/drm/xe/xe_tlb_inval.h
··· 43 43 44 44 void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno); 45 45 46 + bool xe_tlb_inval_idle(struct xe_tlb_inval *tlb_inval); 47 + 46 48 #endif /* _XE_TLB_INVAL_ */
+3 -13
drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
··· 82 82 if (place->flags & TTM_PL_FLAG_TOPDOWN) 83 83 vres->flags |= GPU_BUDDY_TOPDOWN_ALLOCATION; 84 84 85 + if (place->flags & TTM_PL_FLAG_CONTIGUOUS) 86 + vres->flags |= GPU_BUDDY_CONTIGUOUS_ALLOCATION; 87 + 85 88 if (place->fpfn || lpfn != man->size >> PAGE_SHIFT) 86 89 vres->flags |= GPU_BUDDY_RANGE_ALLOCATION; 87 90 ··· 114 111 goto error_unlock; 115 112 } 116 113 117 - if (place->fpfn + (size >> PAGE_SHIFT) != lpfn && 118 - place->flags & TTM_PL_FLAG_CONTIGUOUS) { 119 - size = roundup_pow_of_two(size); 120 - min_page_size = size; 121 - 122 - lpfn = max_t(unsigned long, place->fpfn + (size >> PAGE_SHIFT), lpfn); 123 - } 124 - 125 114 err = gpu_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT, 126 115 (u64)lpfn << PAGE_SHIFT, size, 127 116 min_page_size, &vres->blocks, vres->flags); 128 117 if (err) 129 118 goto error_unlock; 130 - 131 - if (place->flags & TTM_PL_FLAG_CONTIGUOUS) { 132 - if (!gpu_buddy_block_trim(mm, NULL, vres->base.size, &vres->blocks)) 133 - size = vres->base.size; 134 - } 135 119 136 120 if (lpfn <= mgr->visible_size >> PAGE_SHIFT) { 137 121 vres->used_visible_size = size;
+30 -6
drivers/gpu/drm/xe/xe_tuning.c
··· 10 10 #include <drm/drm_managed.h> 11 11 #include <drm/drm_print.h> 12 12 13 + #include "regs/xe_engine_regs.h" 13 14 #include "regs/xe_gt_regs.h" 14 15 #include "xe_gt_types.h" 15 16 #include "xe_platform_types.h" 16 17 #include "xe_rtp.h" 18 + #include "xe_sriov.h" 17 19 18 20 #undef XE_REG_MCR 19 21 #define XE_REG_MCR(...) XE_REG(__VA_ARGS__, .mcr = 1) ··· 33 31 /* Xe2 */ 34 32 35 33 { XE_RTP_NAME("Tuning: L3 cache"), 36 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED)), 34 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 3499)), 37 35 XE_RTP_ACTIONS(FIELD_SET(XEHP_L3SQCREG5, L3_PWM_TIMER_INIT_VAL_MASK, 38 36 REG_FIELD_PREP(L3_PWM_TIMER_INIT_VAL_MASK, 0x7f))) 39 37 }, 40 38 { XE_RTP_NAME("Tuning: L3 cache - media"), 41 - XE_RTP_RULES(MEDIA_VERSION_RANGE(2000, XE_RTP_END_VERSION_UNDEFINED)), 39 + XE_RTP_RULES(MEDIA_VERSION_RANGE(2000, 3499)), 42 40 XE_RTP_ACTIONS(FIELD_SET(XE2LPM_L3SQCREG5, L3_PWM_TIMER_INIT_VAL_MASK, 43 41 REG_FIELD_PREP(L3_PWM_TIMER_INIT_VAL_MASK, 0x7f))) 44 42 }, ··· 54 52 SET(XE2LPM_CCCHKNREG1, L3CMPCTRL)) 55 53 }, 56 54 { XE_RTP_NAME("Tuning: Enable compressible partial write overfetch in L3"), 57 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED)), 55 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 3499)), 58 56 XE_RTP_ACTIONS(SET(L3SQCREG3, COMPPWOVERFETCHEN)) 59 57 }, 60 58 { XE_RTP_NAME("Tuning: Enable compressible partial write overfetch in L3 - media"), ··· 91 89 XE_RTP_RULES(MEDIA_VERSION(2000)), 92 90 XE_RTP_ACTIONS(SET(XE2LPM_SCRATCH3_LBCF, RWFLUSHALLEN)) 93 91 }, 92 + 93 + /* Xe3p */ 94 + 95 + { XE_RTP_NAME("Tuning: Set STLB Bank Hash Mode to 4KB"), 96 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3510, XE_RTP_END_VERSION_UNDEFINED), 97 + IS_INTEGRATED), 98 + XE_RTP_ACTIONS(FIELD_SET(XEHP_GAMSTLB_CTRL, BANK_HASH_MODE, 99 + BANK_HASH_4KB_MODE)) 100 + }, 94 101 }; 95 102 96 103 static const struct xe_rtp_entry_sr engine_tunings[] = { ··· 118 107 FUNC(xe_rtp_match_first_render_or_compute)), 119 108 XE_RTP_ACTIONS(SET(RT_CTRL, DIS_NULL_QUERY)) 120 109 }, 110 + { XE_RTP_NAME("Tuning: disable HW reporting of ctx switch to GHWSP"), 111 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3500, XE_RTP_END_VERSION_UNDEFINED)), 112 + XE_RTP_ACTIONS(SET(CSFE_CHICKEN1(0), 113 + GHWSP_CSB_REPORT_DIS, 114 + XE_RTP_ACTION_FLAG(ENGINE_BASE))) 115 + }, 121 116 }; 122 117 123 118 static const struct xe_rtp_entry_sr lrc_tunings[] = { 119 + { XE_RTP_NAME("Tuning: Windower HW Filtering"), 120 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3599), ENGINE_CLASS(RENDER)), 121 + XE_RTP_ACTIONS(SET(COMMON_SLICE_CHICKEN4, HW_FILTERING)) 122 + }, 123 + 124 124 /* DG2 */ 125 125 126 126 { XE_RTP_NAME("Tuning: L3 cache"), ··· 206 184 xe_rtp_process_ctx_enable_active_tracking(&ctx, 207 185 gt->tuning_active.gt, 208 186 ARRAY_SIZE(gt_tunings)); 209 - xe_rtp_process_to_sr(&ctx, gt_tunings, ARRAY_SIZE(gt_tunings), &gt->reg_sr); 187 + xe_rtp_process_to_sr(&ctx, gt_tunings, ARRAY_SIZE(gt_tunings), 188 + &gt->reg_sr, false); 210 189 } 211 190 EXPORT_SYMBOL_IF_KUNIT(xe_tuning_process_gt); 212 191 ··· 219 196 hwe->gt->tuning_active.engine, 220 197 ARRAY_SIZE(engine_tunings)); 221 198 xe_rtp_process_to_sr(&ctx, engine_tunings, ARRAY_SIZE(engine_tunings), 222 - &hwe->reg_sr); 199 + &hwe->reg_sr, false); 223 200 } 224 201 EXPORT_SYMBOL_IF_KUNIT(xe_tuning_process_engine); 225 202 ··· 238 215 xe_rtp_process_ctx_enable_active_tracking(&ctx, 239 216 hwe->gt->tuning_active.lrc, 240 217 ARRAY_SIZE(lrc_tunings)); 241 - xe_rtp_process_to_sr(&ctx, lrc_tunings, ARRAY_SIZE(lrc_tunings), &hwe->reg_lrc); 218 + xe_rtp_process_to_sr(&ctx, lrc_tunings, ARRAY_SIZE(lrc_tunings), 219 + &hwe->reg_lrc, true); 242 220 } 243 221 244 222 /**
+5 -5
drivers/gpu/drm/xe/xe_uc.c
··· 13 13 #include "xe_gt_sriov_vf.h" 14 14 #include "xe_guc.h" 15 15 #include "xe_guc_pc.h" 16 + #include "xe_guc_rc.h" 16 17 #include "xe_guc_engine_activity.h" 17 18 #include "xe_huc.h" 18 19 #include "xe_sriov.h" ··· 215 214 if (ret) 216 215 goto err_out; 217 216 217 + ret = xe_guc_rc_enable(&uc->guc); 218 + if (ret) 219 + goto err_out; 220 + 218 221 xe_guc_engine_activity_enable_stats(&uc->guc); 219 222 220 223 /* We don't fail the driver load if HuC fails to auth */ ··· 245 240 return 0; 246 241 247 242 return xe_guc_reset_prepare(&uc->guc); 248 - } 249 - 250 - void xe_uc_gucrc_disable(struct xe_uc *uc) 251 - { 252 - XE_WARN_ON(xe_guc_pc_gucrc_disable(&uc->guc.pc)); 253 243 } 254 244 255 245 void xe_uc_stop_prepare(struct xe_uc *uc)
-1
drivers/gpu/drm/xe/xe_uc.h
··· 12 12 int xe_uc_init(struct xe_uc *uc); 13 13 int xe_uc_init_post_hwconfig(struct xe_uc *uc); 14 14 int xe_uc_load_hw(struct xe_uc *uc); 15 - void xe_uc_gucrc_disable(struct xe_uc *uc); 16 15 int xe_uc_reset_prepare(struct xe_uc *uc); 17 16 void xe_uc_runtime_resume(struct xe_uc *uc); 18 17 void xe_uc_runtime_suspend(struct xe_uc *uc);
+83 -5
drivers/gpu/drm/xe/xe_vm.c
··· 1112 1112 struct xe_vma *vma = container_of(cb, struct xe_vma, destroy_cb); 1113 1113 1114 1114 INIT_WORK(&vma->destroy_work, vma_destroy_work_func); 1115 - queue_work(system_unbound_wq, &vma->destroy_work); 1115 + queue_work(system_dfl_wq, &vma->destroy_work); 1116 1116 } 1117 1117 1118 1118 static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence) ··· 1474 1474 } 1475 1475 } 1476 1476 1477 + static void xe_vm_init_prove_locking(struct xe_device *xe, struct xe_vm *vm) 1478 + { 1479 + if (!IS_ENABLED(CONFIG_PROVE_LOCKING)) 1480 + return; 1481 + 1482 + fs_reclaim_acquire(GFP_KERNEL); 1483 + might_lock(&vm->exec_queues.lock); 1484 + fs_reclaim_release(GFP_KERNEL); 1485 + 1486 + down_read(&vm->exec_queues.lock); 1487 + might_lock(&xe_root_mmio_gt(xe)->uc.guc.ct.lock); 1488 + up_read(&vm->exec_queues.lock); 1489 + } 1490 + 1477 1491 struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags, struct xe_file *xef) 1478 1492 { 1479 1493 struct drm_gem_object *vm_resv_obj; ··· 1543 1529 INIT_WORK(&vm->destroy_work, vm_destroy_work_func); 1544 1530 1545 1531 INIT_LIST_HEAD(&vm->preempt.exec_queues); 1532 + for (id = 0; id < XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE; ++id) 1533 + INIT_LIST_HEAD(&vm->exec_queues.list[id]); 1546 1534 if (flags & XE_VM_FLAG_FAULT_MODE) 1547 1535 vm->preempt.min_run_period_ms = xe->min_run_period_pf_ms; 1548 1536 else 1549 1537 vm->preempt.min_run_period_ms = xe->min_run_period_lr_ms; 1538 + 1539 + init_rwsem(&vm->exec_queues.lock); 1540 + xe_vm_init_prove_locking(xe, vm); 1550 1541 1551 1542 for_each_tile(tile, xe, id) 1552 1543 xe_range_fence_tree_init(&vm->rftree[id]); ··· 1657 1638 if (!vm->pt_root[id]) 1658 1639 continue; 1659 1640 1641 + if (!xef) /* Not from userspace */ 1642 + create_flags |= EXEC_QUEUE_FLAG_KERNEL; 1643 + 1660 1644 q = xe_exec_queue_create_bind(xe, tile, vm, create_flags, 0); 1661 1645 if (IS_ERR(q)) { 1662 1646 err = PTR_ERR(q); ··· 1675 1653 down_write(&xe->usm.lock); 1676 1654 err = xa_alloc_cyclic(&xe->usm.asid_to_vm, &asid, vm, 1677 1655 XA_LIMIT(1, XE_MAX_ASID - 1), 1678 - &xe->usm.next_asid, GFP_KERNEL); 1656 + &xe->usm.next_asid, GFP_NOWAIT); 1679 1657 up_write(&xe->usm.lock); 1680 1658 if (err < 0) 1681 1659 goto err_close; ··· 1897 1875 struct xe_vm *vm = container_of(gpuvm, struct xe_vm, gpuvm); 1898 1876 1899 1877 /* To destroy the VM we need to be able to sleep */ 1900 - queue_work(system_unbound_wq, &vm->destroy_work); 1878 + queue_work(system_dfl_wq, &vm->destroy_work); 1901 1879 } 1902 1880 1903 1881 struct xe_vm *xe_vm_lookup(struct xe_file *xef, u32 id) ··· 1941 1919 1942 1920 #define ALL_DRM_XE_VM_CREATE_FLAGS (DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE | \ 1943 1921 DRM_XE_VM_CREATE_FLAG_LR_MODE | \ 1944 - DRM_XE_VM_CREATE_FLAG_FAULT_MODE) 1922 + DRM_XE_VM_CREATE_FLAG_FAULT_MODE | \ 1923 + DRM_XE_VM_CREATE_FLAG_NO_VM_OVERCOMMIT) 1945 1924 1946 1925 int xe_vm_create_ioctl(struct drm_device *dev, void *data, 1947 1926 struct drm_file *file) ··· 1981 1958 args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE)) 1982 1959 return -EINVAL; 1983 1960 1961 + if (XE_IOCTL_DBG(xe, !(args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE) && 1962 + args->flags & DRM_XE_VM_CREATE_FLAG_NO_VM_OVERCOMMIT)) 1963 + return -EINVAL; 1964 + 1984 1965 if (args->flags & DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE) 1985 1966 flags |= XE_VM_FLAG_SCRATCH_PAGE; 1986 1967 if (args->flags & DRM_XE_VM_CREATE_FLAG_LR_MODE) 1987 1968 flags |= XE_VM_FLAG_LR_MODE; 1988 1969 if (args->flags & DRM_XE_VM_CREATE_FLAG_FAULT_MODE) 1989 1970 flags |= XE_VM_FLAG_FAULT_MODE; 1971 + if (args->flags & DRM_XE_VM_CREATE_FLAG_NO_VM_OVERCOMMIT) 1972 + flags |= XE_VM_FLAG_NO_VM_OVERCOMMIT; 1990 1973 1991 1974 vm = xe_vm_create(xe, flags, xef); 1992 1975 if (IS_ERR(vm)) ··· 2913 2884 err = drm_exec_lock_obj(exec, &bo->ttm.base); 2914 2885 if (!err && validate) 2915 2886 err = xe_bo_validate(bo, vm, 2916 - !xe_vm_in_preempt_fence_mode(vm) && 2887 + xe_vm_allow_vm_eviction(vm) && 2917 2888 res_evict, exec); 2918 2889 } 2919 2890 ··· 4600 4571 return xe_vm_alloc_vma(vm, &map_req, false); 4601 4572 } 4602 4573 4574 + /** 4575 + * xe_vm_add_exec_queue() - Add exec queue to VM 4576 + * @vm: The VM. 4577 + * @q: The exec_queue 4578 + * 4579 + * Add exec queue to VM, skipped if the device does not have context based TLB 4580 + * invalidations. 4581 + */ 4582 + void xe_vm_add_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q) 4583 + { 4584 + struct xe_device *xe = vm->xe; 4585 + 4586 + /* User VMs and queues only */ 4587 + xe_assert(xe, !(q->flags & EXEC_QUEUE_FLAG_KERNEL)); 4588 + xe_assert(xe, !(q->flags & EXEC_QUEUE_FLAG_PERMANENT)); 4589 + xe_assert(xe, !(q->flags & EXEC_QUEUE_FLAG_VM)); 4590 + xe_assert(xe, !(q->flags & EXEC_QUEUE_FLAG_MIGRATE)); 4591 + xe_assert(xe, vm->xef); 4592 + xe_assert(xe, vm == q->vm); 4593 + 4594 + if (!xe->info.has_ctx_tlb_inval) 4595 + return; 4596 + 4597 + down_write(&vm->exec_queues.lock); 4598 + list_add(&q->vm_exec_queue_link, &vm->exec_queues.list[q->gt->info.id]); 4599 + ++vm->exec_queues.count[q->gt->info.id]; 4600 + up_write(&vm->exec_queues.lock); 4601 + } 4602 + 4603 + /** 4604 + * xe_vm_remove_exec_queue() - Remove exec queue from VM 4605 + * @vm: The VM. 4606 + * @q: The exec_queue 4607 + * 4608 + * Remove exec queue from VM, skipped if the device does not have context based 4609 + * TLB invalidations. 4610 + */ 4611 + void xe_vm_remove_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q) 4612 + { 4613 + if (!vm->xe->info.has_ctx_tlb_inval) 4614 + return; 4615 + 4616 + down_write(&vm->exec_queues.lock); 4617 + if (!list_empty(&q->vm_exec_queue_link)) { 4618 + list_del(&q->vm_exec_queue_link); 4619 + --vm->exec_queues.count[q->gt->info.id]; 4620 + } 4621 + up_write(&vm->exec_queues.lock); 4622 + }
+10
drivers/gpu/drm/xe/xe_vm.h
··· 220 220 return xe_vm_in_lr_mode(vm) && !xe_vm_in_fault_mode(vm); 221 221 } 222 222 223 + static inline bool xe_vm_allow_vm_eviction(struct xe_vm *vm) 224 + { 225 + return !xe_vm_in_lr_mode(vm) || 226 + (xe_vm_in_fault_mode(vm) && 227 + !(vm->flags & XE_VM_FLAG_NO_VM_OVERCOMMIT)); 228 + } 229 + 223 230 int xe_vm_add_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q); 224 231 void xe_vm_remove_compute_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q); 225 232 ··· 293 286 } 294 287 295 288 void xe_vm_kill(struct xe_vm *vm, bool unlocked); 289 + 290 + void xe_vm_add_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q); 291 + void xe_vm_remove_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q); 296 292 297 293 /** 298 294 * xe_vm_assert_held(vm) - Assert that the vm's reservation object is held.
+17
drivers/gpu/drm/xe/xe_vm_types.h
··· 232 232 #define XE_VM_FLAG_TILE_ID(flags) FIELD_GET(GENMASK(7, 6), flags) 233 233 #define XE_VM_FLAG_SET_TILE_ID(tile) FIELD_PREP(GENMASK(7, 6), (tile)->id) 234 234 #define XE_VM_FLAG_GSC BIT(8) 235 + #define XE_VM_FLAG_NO_VM_OVERCOMMIT BIT(9) 235 236 unsigned long flags; 236 237 237 238 /** ··· 298 297 */ 299 298 struct list_head pm_activate_link; 300 299 } preempt; 300 + 301 + /** @exec_queues: Manages list of exec queues attached to this VM, protected by lock. */ 302 + struct { 303 + /** 304 + * @exec_queues.list: list of exec queues attached to this VM, 305 + * per GT 306 + */ 307 + struct list_head list[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE]; 308 + /** 309 + * @exec_queues.count: count of exec queues attached to this VM, 310 + * per GT 311 + */ 312 + int count[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE]; 313 + /** @exec_queues.lock: lock to protect exec_queues list */ 314 + struct rw_semaphore lock; 315 + } exec_queues; 301 316 302 317 /** @um: unified memory state */ 303 318 struct {
+1 -1
drivers/gpu/drm/xe/xe_vram_freq.c
··· 5 5 #include <linux/sysfs.h> 6 6 #include <drm/drm_managed.h> 7 7 8 + #include "xe_device_types.h" 8 9 #include "xe_pcode.h" 9 10 #include "xe_pcode_api.h" 10 - #include "xe_tile.h" 11 11 #include "xe_tile_sysfs.h" 12 12 #include "xe_vram_freq.h" 13 13
+158 -245
drivers/gpu/drm/xe/xe_wa.c
··· 111 111 * difference of how they are maintained in the code. In xe it uses the 112 112 * xe_rtp infrastructure so the workarounds can be kept in tables, following 113 113 * a more declarative approach rather than procedural. 114 + * 115 + * .. note:: 116 + * When a workaround applies to every single known IP version in a range, 117 + * the preferred handling is to use a single range-based RTP entry rather 118 + * than individual entries for each version, even if some of the intermediate 119 + * version numbers are currently unused. If a new intermediate IP version 120 + * appears in the future and is enabled in the driver, any existing 121 + * range-based entries that contain the new version number will need to be 122 + * analyzed to determine whether their workarounds should apply to the new 123 + * version, or whether any existing range based entries needs to be split 124 + * into two entries that do not include the new intermediate version. 114 125 */ 115 126 116 127 #undef XE_REG_MCR ··· 131 120 __diag_ignore_all("-Woverride-init", "Allow field overrides in table"); 132 121 133 122 static const struct xe_rtp_entry_sr gt_was[] = { 123 + /* Workarounds applying over a range of IPs */ 124 + 134 125 { XE_RTP_NAME("14011060649"), 135 126 XE_RTP_RULES(MEDIA_VERSION_RANGE(1200, 1255), 136 127 ENGINE_CLASS(VIDEO_DECODE), ··· 147 134 { XE_RTP_NAME("14015795083"), 148 135 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1260)), 149 136 XE_RTP_ACTIONS(CLR(MISCCPCTL, DOP_CLOCK_GATE_RENDER_ENABLE)) 137 + }, 138 + { XE_RTP_NAME("16021867713"), 139 + XE_RTP_RULES(MEDIA_VERSION_RANGE(1300, 3002), 140 + ENGINE_CLASS(VIDEO_DECODE)), 141 + XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F1C(0), MFXPIPE_CLKGATE_DIS)), 142 + XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 143 + }, 144 + { XE_RTP_NAME("14019449301"), 145 + XE_RTP_RULES(MEDIA_VERSION_RANGE(1301, 2000), ENGINE_CLASS(VIDEO_DECODE)), 146 + XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F08(0), CG3DDISHRS_CLKGATE_DIS)), 147 + XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 148 + }, 149 + { XE_RTP_NAME("16028005424"), 150 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3005), OR, 151 + MEDIA_VERSION_RANGE(1301, 3500)), 152 + XE_RTP_ACTIONS(SET(GUC_INTR_CHICKEN, DISABLE_SIGNALING_ENGINES)) 150 153 }, 151 154 152 155 /* DG1 */ ··· 220 191 221 192 /* Xe_LPG */ 222 193 223 - { XE_RTP_NAME("14015795083"), 224 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1270, 1271), GRAPHICS_STEP(A0, B0)), 225 - XE_RTP_ACTIONS(CLR(MISCCPCTL, DOP_CLOCK_GATE_RENDER_ENABLE)) 226 - }, 227 194 { XE_RTP_NAME("14018575942"), 228 195 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1270, 1274)), 229 196 XE_RTP_ACTIONS(SET(COMP_MOD_CTRL, FORCE_MISS_FTLB)) ··· 231 206 232 207 /* Xe_LPM+ */ 233 208 234 - { XE_RTP_NAME("16021867713"), 235 - XE_RTP_RULES(MEDIA_VERSION(1300), 236 - ENGINE_CLASS(VIDEO_DECODE)), 237 - XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F1C(0), MFXPIPE_CLKGATE_DIS)), 238 - XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 239 - }, 240 209 { XE_RTP_NAME("22016670082"), 241 210 XE_RTP_RULES(MEDIA_VERSION(1300)), 242 211 XE_RTP_ACTIONS(SET(XELPMP_SQCNT1, ENFORCE_RAR)) ··· 244 225 XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F10(0), IECPUNIT_CLKGATE_DIS)), 245 226 XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 246 227 }, 247 - { XE_RTP_NAME("16021867713"), 248 - XE_RTP_RULES(MEDIA_VERSION(2000), 249 - ENGINE_CLASS(VIDEO_DECODE)), 250 - XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F1C(0), MFXPIPE_CLKGATE_DIS)), 251 - XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 252 - }, 253 - { XE_RTP_NAME("14019449301"), 254 - XE_RTP_RULES(MEDIA_VERSION(2000), ENGINE_CLASS(VIDEO_DECODE)), 255 - XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F08(0), CG3DDISHRS_CLKGATE_DIS)), 256 - XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 257 - }, 258 228 259 229 /* Xe2_HPG */ 260 230 261 231 { XE_RTP_NAME("16025250150"), 262 232 XE_RTP_RULES(GRAPHICS_VERSION(2001)), 263 - XE_RTP_ACTIONS(SET(LSN_VC_REG2, 264 - LSN_LNI_WGT(1) | 265 - LSN_LNE_WGT(1) | 266 - LSN_DIM_X_WGT(1) | 267 - LSN_DIM_Y_WGT(1) | 268 - LSN_DIM_Z_WGT(1))) 269 - }, 270 - 271 - /* Xe2_HPM */ 272 - 273 - { XE_RTP_NAME("16021867713"), 274 - XE_RTP_RULES(MEDIA_VERSION(1301), 275 - ENGINE_CLASS(VIDEO_DECODE)), 276 - XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F1C(0), MFXPIPE_CLKGATE_DIS)), 277 - XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 278 - }, 279 - { XE_RTP_NAME("14019449301"), 280 - XE_RTP_RULES(MEDIA_VERSION(1301), ENGINE_CLASS(VIDEO_DECODE)), 281 - XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F08(0), CG3DDISHRS_CLKGATE_DIS)), 282 - XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 233 + XE_RTP_ACTIONS(FIELD_SET(LSN_VC_REG2, 234 + LSN_LNI_WGT_MASK | LSN_LNE_WGT_MASK | 235 + LSN_DIM_X_WGT_MASK | LSN_DIM_Y_WGT_MASK | 236 + LSN_DIM_Z_WGT_MASK, 237 + LSN_LNI_WGT(1) | LSN_LNE_WGT(1) | 238 + LSN_DIM_X_WGT(1) | LSN_DIM_Y_WGT(1) | 239 + LSN_DIM_Z_WGT(1))) 283 240 }, 284 241 285 242 /* Xe3_LPG */ ··· 267 272 268 273 /* Xe3_LPM */ 269 274 270 - { XE_RTP_NAME("16021867713"), 271 - XE_RTP_RULES(MEDIA_VERSION(3000), 272 - ENGINE_CLASS(VIDEO_DECODE)), 273 - XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F1C(0), MFXPIPE_CLKGATE_DIS)), 274 - XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 275 - }, 276 275 { XE_RTP_NAME("16021865536"), 277 - XE_RTP_RULES(MEDIA_VERSION(3000), 276 + XE_RTP_RULES(MEDIA_VERSION_RANGE(3000, 3002), 278 277 ENGINE_CLASS(VIDEO_DECODE)), 279 278 XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F10(0), IECPUNIT_CLKGATE_DIS)), 280 - XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 281 - }, 282 - { XE_RTP_NAME("16021865536"), 283 - XE_RTP_RULES(MEDIA_VERSION(3002), 284 - ENGINE_CLASS(VIDEO_DECODE)), 285 - XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F10(0), IECPUNIT_CLKGATE_DIS)), 286 - XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 287 - }, 288 - { XE_RTP_NAME("16021867713"), 289 - XE_RTP_RULES(MEDIA_VERSION(3002), 290 - ENGINE_CLASS(VIDEO_DECODE)), 291 - XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F1C(0), MFXPIPE_CLKGATE_DIS)), 292 279 XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 293 280 }, 294 281 { XE_RTP_NAME("14021486841"), ··· 279 302 XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F10(0), RAMDFTUNIT_CLKGATE_DIS)), 280 303 XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 281 304 }, 305 + 306 + /* Xe3P_LPG */ 307 + 308 + { XE_RTP_NAME("14025160223"), 309 + XE_RTP_RULES(GRAPHICS_VERSION(3510), GRAPHICS_STEP(A0, B0)), 310 + XE_RTP_ACTIONS(SET(MMIOATSREQLIMIT_GAM_WALK_3D, 311 + DIS_ATS_WRONLY_PG)) 312 + }, 313 + { XE_RTP_NAME("16028780921"), 314 + XE_RTP_RULES(GRAPHICS_VERSION(3510), GRAPHICS_STEP(A0, B0)), 315 + XE_RTP_ACTIONS(SET(CCCHKNREG2, LOCALITYDIS)) 316 + }, 317 + { XE_RTP_NAME("14026144927"), 318 + XE_RTP_RULES(GRAPHICS_VERSION(3510), GRAPHICS_STEP(A0, B0)), 319 + XE_RTP_ACTIONS(SET(L3SQCREG2, L3_SQ_DISABLE_COAMA_2WAY_COH | 320 + L3_SQ_DISABLE_COAMA)) 321 + }, 322 + { XE_RTP_NAME("14025635424"), 323 + XE_RTP_RULES(GRAPHICS_VERSION(3510), GRAPHICS_STEP(A0, B0)), 324 + XE_RTP_ACTIONS(SET(GAMSTLB_CTRL2, STLB_SINGLE_BANK_MODE)) 325 + }, 282 326 { XE_RTP_NAME("16028005424"), 283 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3005)), 327 + XE_RTP_RULES(GRAPHICS_VERSION(3510), GRAPHICS_STEP(A0, B0)), 284 328 XE_RTP_ACTIONS(SET(GUC_INTR_CHICKEN, DISABLE_SIGNALING_ENGINES)) 285 329 }, 286 330 }; 287 331 288 332 static const struct xe_rtp_entry_sr engine_was[] = { 333 + /* Workarounds applying over a range of IPs */ 334 + 289 335 { XE_RTP_NAME("22010931296, 18011464164, 14010919138"), 290 336 XE_RTP_RULES(GRAPHICS_VERSION(1200), ENGINE_CLASS(RENDER)), 291 337 XE_RTP_ACTIONS(SET(FF_THREAD_MODE(RENDER_RING_BASE), ··· 343 343 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1250), ENGINE_CLASS(RENDER)), 344 344 XE_RTP_ACTIONS(SET(FF_SLICE_CS_CHICKEN1(RENDER_RING_BASE), 345 345 FFSC_PERCTX_PREEMPT_CTRL)) 346 + }, 347 + { XE_RTP_NAME("18032247524"), 348 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), 349 + FUNC(xe_rtp_match_first_render_or_compute)), 350 + XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0, SEQUENTIAL_ACCESS_UPGRADE_DISABLE)) 351 + }, 352 + { XE_RTP_NAME("16018712365"), 353 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), 354 + FUNC(xe_rtp_match_first_render_or_compute)), 355 + XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0_UDW, XE2_ALLOC_DPA_STARVE_FIX_DIS)) 356 + }, 357 + { XE_RTP_NAME("14020338487"), 358 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), 359 + FUNC(xe_rtp_match_first_render_or_compute)), 360 + XE_RTP_ACTIONS(SET(ROW_CHICKEN3, XE2_EUPEND_CHK_FLUSH_DIS)) 361 + }, 362 + { XE_RTP_NAME("14018471104"), 363 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), 364 + FUNC(xe_rtp_match_first_render_or_compute)), 365 + XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0_UDW, ENABLE_SMP_LD_RENDER_SURFACE_CONTROL)) 366 + }, 367 + /* 368 + * Although this workaround isn't required for the RCS, disabling these 369 + * reports has no impact for our driver or the GuC, so we go ahead and 370 + * apply this to all engines for simplicity. 371 + */ 372 + { XE_RTP_NAME("16021639441"), 373 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), OR, 374 + MEDIA_VERSION_RANGE(1301, 2000)), 375 + XE_RTP_ACTIONS(SET(CSFE_CHICKEN1(0), 376 + GHWSP_CSB_REPORT_DIS | 377 + PPHWSP_CSB_AND_TIMESTAMP_REPORT_DIS, 378 + XE_RTP_ACTION_FLAG(ENGINE_BASE))) 379 + }, 380 + { XE_RTP_NAME("14021402888"), 381 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 3005), ENGINE_CLASS(RENDER)), 382 + XE_RTP_ACTIONS(SET(HALF_SLICE_CHICKEN7, CLEAR_OPTIMIZATION_DISABLE)) 383 + }, 384 + { XE_RTP_NAME("13012615864"), 385 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 3005), 386 + FUNC(xe_rtp_match_first_render_or_compute)), 387 + XE_RTP_ACTIONS(SET(TDL_TSL_CHICKEN, RES_CHK_SPR_DIS)) 388 + }, 389 + { XE_RTP_NAME("18041344222"), 390 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 3000), 391 + FUNC(xe_rtp_match_first_render_or_compute), 392 + FUNC(xe_rtp_match_gt_has_discontiguous_dss_groups)), 393 + XE_RTP_ACTIONS(SET(TDL_CHICKEN, EUSTALL_PERF_SAMPLING_DISABLE)) 346 394 }, 347 395 348 396 /* TGL */ ··· 507 459 ENGINE_CLASS(COMPUTE)), 508 460 XE_RTP_ACTIONS(SET(RING_HWSTAM(RENDER_RING_BASE), ~0)) 509 461 }, 510 - { XE_RTP_NAME("14014999345"), 511 - XE_RTP_RULES(PLATFORM(PVC), ENGINE_CLASS(COMPUTE), 512 - GRAPHICS_STEP(B0, C0)), 513 - XE_RTP_ACTIONS(SET(CACHE_MODE_SS, DISABLE_ECC)) 514 - }, 515 462 516 463 /* Xe_LPG */ 517 464 ··· 529 486 530 487 /* Xe2_LPG */ 531 488 532 - { XE_RTP_NAME("18032247524"), 533 - XE_RTP_RULES(GRAPHICS_VERSION(2004), 534 - FUNC(xe_rtp_match_first_render_or_compute)), 535 - XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0, SEQUENTIAL_ACCESS_UPGRADE_DISABLE)) 536 - }, 537 - { XE_RTP_NAME("16018712365"), 538 - XE_RTP_RULES(GRAPHICS_VERSION(2004), FUNC(xe_rtp_match_first_render_or_compute)), 539 - XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0_UDW, XE2_ALLOC_DPA_STARVE_FIX_DIS)) 540 - }, 541 - { XE_RTP_NAME("14020338487"), 542 - XE_RTP_RULES(GRAPHICS_VERSION(2004), FUNC(xe_rtp_match_first_render_or_compute)), 543 - XE_RTP_ACTIONS(SET(ROW_CHICKEN3, XE2_EUPEND_CHK_FLUSH_DIS)) 544 - }, 545 489 { XE_RTP_NAME("18034896535, 16021540221"), /* 16021540221: GRAPHICS_STEP(A0, B0) */ 546 490 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), 547 491 FUNC(xe_rtp_match_first_render_or_compute)), 548 492 XE_RTP_ACTIONS(SET(ROW_CHICKEN4, DISABLE_TDL_PUSH)) 549 493 }, 550 - { XE_RTP_NAME("14018471104"), 551 - XE_RTP_RULES(GRAPHICS_VERSION(2004), FUNC(xe_rtp_match_first_render_or_compute)), 552 - XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0_UDW, ENABLE_SMP_LD_RENDER_SURFACE_CONTROL)) 553 - }, 554 - /* 555 - * These two workarounds are the same, just applying to different 556 - * engines. Although Wa_18032095049 (for the RCS) isn't required on 557 - * all steppings, disabling these reports has no impact for our 558 - * driver or the GuC, so we go ahead and treat it the same as 559 - * Wa_16021639441 which does apply to all steppings. 560 - */ 561 - { XE_RTP_NAME("18032095049, 16021639441"), 562 - XE_RTP_RULES(GRAPHICS_VERSION(2004)), 563 - XE_RTP_ACTIONS(SET(CSFE_CHICKEN1(0), 564 - GHWSP_CSB_REPORT_DIS | 565 - PPHWSP_CSB_AND_TIMESTAMP_REPORT_DIS, 566 - XE_RTP_ACTION_FLAG(ENGINE_BASE))) 567 - }, 568 494 { XE_RTP_NAME("16018610683"), 569 495 XE_RTP_RULES(GRAPHICS_VERSION(2004), FUNC(xe_rtp_match_first_render_or_compute)), 570 496 XE_RTP_ACTIONS(SET(TDL_TSL_CHICKEN, SLM_WMTP_RESTORE)) 571 497 }, 572 - { XE_RTP_NAME("14021402888"), 573 - XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)), 574 - XE_RTP_ACTIONS(SET(HALF_SLICE_CHICKEN7, CLEAR_OPTIMIZATION_DISABLE)) 575 - }, 576 - { XE_RTP_NAME("13012615864"), 577 - XE_RTP_RULES(GRAPHICS_VERSION(2004), 578 - FUNC(xe_rtp_match_first_render_or_compute)), 579 - XE_RTP_ACTIONS(SET(TDL_TSL_CHICKEN, RES_CHK_SPR_DIS)) 580 - }, 581 498 582 499 /* Xe2_HPG */ 583 500 584 - { XE_RTP_NAME("16018712365"), 585 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2002), 586 - FUNC(xe_rtp_match_first_render_or_compute)), 587 - XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0_UDW, XE2_ALLOC_DPA_STARVE_FIX_DIS)) 588 - }, 589 501 { XE_RTP_NAME("16018737384"), 590 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED), 502 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2999), 591 503 FUNC(xe_rtp_match_first_render_or_compute)), 592 504 XE_RTP_ACTIONS(SET(ROW_CHICKEN, EARLY_EOT_DIS)) 593 - }, 594 - { XE_RTP_NAME("14020338487"), 595 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2002), 596 - FUNC(xe_rtp_match_first_render_or_compute)), 597 - XE_RTP_ACTIONS(SET(ROW_CHICKEN3, XE2_EUPEND_CHK_FLUSH_DIS)) 598 - }, 599 - { XE_RTP_NAME("18032247524"), 600 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2002), 601 - FUNC(xe_rtp_match_first_render_or_compute)), 602 - XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0, SEQUENTIAL_ACCESS_UPGRADE_DISABLE)) 603 - }, 604 - { XE_RTP_NAME("14018471104"), 605 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2002), 606 - FUNC(xe_rtp_match_first_render_or_compute)), 607 - XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0_UDW, ENABLE_SMP_LD_RENDER_SURFACE_CONTROL)) 608 - }, 609 - /* 610 - * Although this workaround isn't required for the RCS, disabling these 611 - * reports has no impact for our driver or the GuC, so we go ahead and 612 - * apply this to all engines for simplicity. 613 - */ 614 - { XE_RTP_NAME("16021639441"), 615 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2002)), 616 - XE_RTP_ACTIONS(SET(CSFE_CHICKEN1(0), 617 - GHWSP_CSB_REPORT_DIS | 618 - PPHWSP_CSB_AND_TIMESTAMP_REPORT_DIS, 619 - XE_RTP_ACTION_FLAG(ENGINE_BASE))) 620 505 }, 621 506 { XE_RTP_NAME("14019811474"), 622 507 XE_RTP_RULES(GRAPHICS_VERSION(2001), 623 508 FUNC(xe_rtp_match_first_render_or_compute)), 624 509 XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0, WR_REQ_CHAINING_DIS)) 625 510 }, 626 - { XE_RTP_NAME("14021402888"), 627 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2002), ENGINE_CLASS(RENDER)), 628 - XE_RTP_ACTIONS(SET(HALF_SLICE_CHICKEN7, CLEAR_OPTIMIZATION_DISABLE)) 629 - }, 630 511 { XE_RTP_NAME("14021821874, 14022954250"), 631 512 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2002), 632 513 FUNC(xe_rtp_match_first_render_or_compute)), 633 514 XE_RTP_ACTIONS(SET(TDL_TSL_CHICKEN, STK_ID_RESTRICT)) 634 515 }, 635 - { XE_RTP_NAME("13012615864"), 636 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2002), 637 - FUNC(xe_rtp_match_first_render_or_compute)), 638 - XE_RTP_ACTIONS(SET(TDL_TSL_CHICKEN, RES_CHK_SPR_DIS)) 639 - }, 640 - { XE_RTP_NAME("18041344222"), 641 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2002), 642 - FUNC(xe_rtp_match_first_render_or_compute), 643 - FUNC(xe_rtp_match_not_sriov_vf), 644 - FUNC(xe_rtp_match_gt_has_discontiguous_dss_groups)), 645 - XE_RTP_ACTIONS(SET(TDL_CHICKEN, EUSTALL_PERF_SAMPLING_DISABLE)) 646 - }, 647 - 648 - /* Xe2_LPM */ 649 - 650 - { XE_RTP_NAME("16021639441"), 651 - XE_RTP_RULES(MEDIA_VERSION(2000)), 652 - XE_RTP_ACTIONS(SET(CSFE_CHICKEN1(0), 653 - GHWSP_CSB_REPORT_DIS | 654 - PPHWSP_CSB_AND_TIMESTAMP_REPORT_DIS, 655 - XE_RTP_ACTION_FLAG(ENGINE_BASE))) 656 - }, 657 - 658 - /* Xe2_HPM */ 659 - 660 - { XE_RTP_NAME("16021639441"), 661 - XE_RTP_RULES(MEDIA_VERSION(1301)), 662 - XE_RTP_ACTIONS(SET(CSFE_CHICKEN1(0), 663 - GHWSP_CSB_REPORT_DIS | 664 - PPHWSP_CSB_AND_TIMESTAMP_REPORT_DIS, 665 - XE_RTP_ACTION_FLAG(ENGINE_BASE))) 666 - }, 667 516 668 517 /* Xe3_LPG */ 669 518 670 - { XE_RTP_NAME("14021402888"), 671 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3001), 672 - FUNC(xe_rtp_match_first_render_or_compute)), 673 - XE_RTP_ACTIONS(SET(HALF_SLICE_CHICKEN7, CLEAR_OPTIMIZATION_DISABLE)) 674 - }, 675 519 { XE_RTP_NAME("18034896535"), 676 520 XE_RTP_RULES(GRAPHICS_VERSION(3000), GRAPHICS_STEP(A0, B0), 677 521 FUNC(xe_rtp_match_first_render_or_compute)), ··· 571 641 SMP_FORCE_128B_OVERFETCH)) 572 642 }, 573 643 { XE_RTP_NAME("14023061436"), 574 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3001), 575 - FUNC(xe_rtp_match_first_render_or_compute), OR, 576 - GRAPHICS_VERSION_RANGE(3003, 3005), 644 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3005), 577 645 FUNC(xe_rtp_match_first_render_or_compute)), 578 646 XE_RTP_ACTIONS(SET(TDL_CHICKEN, QID_WAIT_FOR_THREAD_NOT_RUN_DISABLE)) 579 - }, 580 - { XE_RTP_NAME("13012615864"), 581 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3001), OR, 582 - GRAPHICS_VERSION_RANGE(3003, 3005), 583 - FUNC(xe_rtp_match_first_render_or_compute)), 584 - XE_RTP_ACTIONS(SET(TDL_TSL_CHICKEN, RES_CHK_SPR_DIS)) 585 647 }, 586 648 { XE_RTP_NAME("16023105232"), 587 649 XE_RTP_RULES(MEDIA_VERSION_RANGE(1301, 3000), OR, ··· 581 659 XE_RTP_ACTIONS(SET(RING_PSMI_CTL(0), RC_SEMA_IDLE_MSG_DISABLE, 582 660 XE_RTP_ACTION_FLAG(ENGINE_BASE))) 583 661 }, 584 - { XE_RTP_NAME("14021402888"), 585 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3003, 3005), FUNC(xe_rtp_match_first_render_or_compute)), 586 - XE_RTP_ACTIONS(SET(HALF_SLICE_CHICKEN7, CLEAR_OPTIMIZATION_DISABLE)) 662 + 663 + /* Xe3p_LPG*/ 664 + 665 + { XE_RTP_NAME("22021149932"), 666 + XE_RTP_RULES(GRAPHICS_VERSION(3510), GRAPHICS_STEP(A0, B0), 667 + FUNC(xe_rtp_match_first_render_or_compute)), 668 + XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0_UDW, SAMPLER_LD_LSC_DISABLE)) 587 669 }, 588 - { XE_RTP_NAME("18041344222"), 589 - XE_RTP_RULES(GRAPHICS_VERSION(3000), 590 - FUNC(xe_rtp_match_first_render_or_compute), 591 - FUNC(xe_rtp_match_not_sriov_vf), 592 - FUNC(xe_rtp_match_gt_has_discontiguous_dss_groups)), 593 - XE_RTP_ACTIONS(SET(TDL_CHICKEN, EUSTALL_PERF_SAMPLING_DISABLE)) 670 + { XE_RTP_NAME("14025676848"), 671 + XE_RTP_RULES(GRAPHICS_VERSION(3510), GRAPHICS_STEP(A0, B0), 672 + FUNC(xe_rtp_match_first_render_or_compute)), 673 + XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0_UDW, LSCFE_SAME_ADDRESS_ATOMICS_COALESCING_DISABLE)) 674 + }, 675 + { XE_RTP_NAME("16028951944"), 676 + XE_RTP_RULES(GRAPHICS_VERSION(3510), GRAPHICS_STEP(A0, B0), 677 + FUNC(xe_rtp_match_first_render_or_compute)), 678 + XE_RTP_ACTIONS(SET(ROW_CHICKEN5, CPSS_AWARE_DIS)) 594 679 }, 595 680 }; 596 681 ··· 635 706 XE_RTP_RULES(GRAPHICS_VERSION(1200)), 636 707 XE_RTP_ACTIONS(SET(COMMON_SLICE_CHICKEN4, DISABLE_TDC_LOAD_BALANCING_CALC)) 637 708 }, 709 + { XE_RTP_NAME("14019877138"), 710 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, 2004), ENGINE_CLASS(RENDER)), 711 + XE_RTP_ACTIONS(SET(XEHP_PSS_CHICKEN, FD_END_COLLECT)) 712 + }, 713 + { XE_RTP_NAME("14019386621"), 714 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), ENGINE_CLASS(RENDER)), 715 + XE_RTP_ACTIONS(SET(VF_SCRATCHPAD, XE2_VFG_TED_CREDIT_INTERFACE_DISABLE)) 716 + }, 717 + { XE_RTP_NAME("14019988906"), 718 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), ENGINE_CLASS(RENDER)), 719 + XE_RTP_ACTIONS(SET(XEHP_PSS_CHICKEN, FLSH_IGNORES_PSD)) 720 + }, 721 + { XE_RTP_NAME("18033852989"), 722 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), ENGINE_CLASS(RENDER)), 723 + XE_RTP_ACTIONS(SET(COMMON_SLICE_CHICKEN1, DISABLE_BOTTOM_CLIP_RECTANGLE_TEST)) 724 + }, 725 + { XE_RTP_NAME("15016589081"), 726 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), ENGINE_CLASS(RENDER)), 727 + XE_RTP_ACTIONS(SET(CHICKEN_RASTER_1, DIS_CLIP_NEGATIVE_BOUNDING_BOX)) 728 + }, 638 729 639 730 /* DG1 */ 640 731 ··· 691 742 XE_RTP_RULES(PLATFORM(DG2)), 692 743 XE_RTP_ACTIONS(SET(CACHE_MODE_1, MSAA_OPTIMIZATION_REDUC_DISABLE)) 693 744 }, 694 - { XE_RTP_NAME("14019877138"), 695 - XE_RTP_RULES(PLATFORM(DG2)), 696 - XE_RTP_ACTIONS(SET(XEHP_PSS_CHICKEN, FD_END_COLLECT)) 697 - }, 698 745 699 746 /* PVC */ 700 747 ··· 708 763 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1270, 1274)), 709 764 XE_RTP_ACTIONS(SET(CACHE_MODE_1, MSAA_OPTIMIZATION_REDUC_DISABLE)) 710 765 }, 711 - { XE_RTP_NAME("14019877138"), 712 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1270, 1274), ENGINE_CLASS(RENDER)), 713 - XE_RTP_ACTIONS(SET(XEHP_PSS_CHICKEN, FD_END_COLLECT)) 714 - }, 715 766 716 767 /* Xe2_LPG */ 717 768 718 - { XE_RTP_NAME("14019386621"), 719 - XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)), 720 - XE_RTP_ACTIONS(SET(VF_SCRATCHPAD, XE2_VFG_TED_CREDIT_INTERFACE_DISABLE)) 721 - }, 722 - { XE_RTP_NAME("14019877138"), 723 - XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)), 724 - XE_RTP_ACTIONS(SET(XEHP_PSS_CHICKEN, FD_END_COLLECT)) 725 - }, 726 - { XE_RTP_NAME("14019988906"), 727 - XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)), 728 - XE_RTP_ACTIONS(SET(XEHP_PSS_CHICKEN, FLSH_IGNORES_PSD)) 729 - }, 730 - { XE_RTP_NAME("18033852989"), 731 - XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)), 732 - XE_RTP_ACTIONS(SET(COMMON_SLICE_CHICKEN1, DISABLE_BOTTOM_CLIP_RECTANGLE_TEST)) 733 - }, 734 769 { XE_RTP_NAME("14021567978"), 735 770 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, XE_RTP_END_VERSION_UNDEFINED), 736 771 ENGINE_CLASS(RENDER)), ··· 730 805 DIS_PARTIAL_AUTOSTRIP | 731 806 DIS_AUTOSTRIP)) 732 807 }, 733 - { XE_RTP_NAME("15016589081"), 734 - XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)), 735 - XE_RTP_ACTIONS(SET(CHICKEN_RASTER_1, DIS_CLIP_NEGATIVE_BOUNDING_BOX)) 736 - }, 737 808 738 809 /* Xe2_HPG */ 739 - { XE_RTP_NAME("15010599737"), 740 - XE_RTP_RULES(GRAPHICS_VERSION(2001), ENGINE_CLASS(RENDER)), 741 - XE_RTP_ACTIONS(SET(CHICKEN_RASTER_1, DIS_SF_ROUND_NEAREST_EVEN)) 742 - }, 743 - { XE_RTP_NAME("14019386621"), 744 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2002), ENGINE_CLASS(RENDER)), 745 - XE_RTP_ACTIONS(SET(VF_SCRATCHPAD, XE2_VFG_TED_CREDIT_INTERFACE_DISABLE)) 746 - }, 810 + 747 811 { XE_RTP_NAME("14020756599"), 748 812 XE_RTP_RULES(GRAPHICS_VERSION(2001), ENGINE_CLASS(RENDER)), 749 813 XE_RTP_ACTIONS(SET(WM_CHICKEN3, HIZ_PLANE_COMPRESSION_DIS)) ··· 754 840 DIS_PARTIAL_AUTOSTRIP | 755 841 DIS_AUTOSTRIP)) 756 842 }, 757 - { XE_RTP_NAME("15016589081"), 758 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2002), ENGINE_CLASS(RENDER)), 759 - XE_RTP_ACTIONS(SET(CHICKEN_RASTER_1, DIS_CLIP_NEGATIVE_BOUNDING_BOX)) 760 - }, 761 843 { XE_RTP_NAME("22021007897"), 762 844 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2002), ENGINE_CLASS(RENDER)), 763 845 XE_RTP_ACTIONS(SET(COMMON_SLICE_CHICKEN4, SBE_PUSH_CONSTANT_BEHIND_FIX_ENABLE)) 764 - }, 765 - { XE_RTP_NAME("18033852989"), 766 - XE_RTP_RULES(GRAPHICS_VERSION(2001), ENGINE_CLASS(RENDER)), 767 - XE_RTP_ACTIONS(SET(COMMON_SLICE_CHICKEN1, DISABLE_BOTTOM_CLIP_RECTANGLE_TEST)) 768 846 }, 769 847 770 848 /* Xe3_LPG */ ··· 782 876 XE_RTP_RULES(GRAPHICS_VERSION(3000), GRAPHICS_STEP(A0, B0), 783 877 ENGINE_CLASS(RENDER)), 784 878 XE_RTP_ACTIONS(SET(CHICKEN_RASTER_1, DIS_CLIP_NEGATIVE_BOUNDING_BOX)) 879 + }, 880 + { XE_RTP_NAME("14026781792"), 881 + XE_RTP_RULES(GRAPHICS_VERSION(3510), ENGINE_CLASS(RENDER)), 882 + XE_RTP_ACTIONS(SET(FF_MODE, DIS_TE_PATCH_CTRL)) 785 883 }, 786 884 }; 787 885 ··· 853 943 854 944 xe_rtp_process_ctx_enable_active_tracking(&ctx, gt->wa_active.gt, 855 945 ARRAY_SIZE(gt_was)); 856 - xe_rtp_process_to_sr(&ctx, gt_was, ARRAY_SIZE(gt_was), &gt->reg_sr); 946 + xe_rtp_process_to_sr(&ctx, gt_was, ARRAY_SIZE(gt_was), 947 + &gt->reg_sr, false); 857 948 } 858 949 EXPORT_SYMBOL_IF_KUNIT(xe_wa_process_gt); 859 950 ··· 872 961 873 962 xe_rtp_process_ctx_enable_active_tracking(&ctx, hwe->gt->wa_active.engine, 874 963 ARRAY_SIZE(engine_was)); 875 - xe_rtp_process_to_sr(&ctx, engine_was, ARRAY_SIZE(engine_was), &hwe->reg_sr); 964 + xe_rtp_process_to_sr(&ctx, engine_was, ARRAY_SIZE(engine_was), 965 + &hwe->reg_sr, false); 876 966 } 877 967 878 968 /** ··· 890 978 891 979 xe_rtp_process_ctx_enable_active_tracking(&ctx, hwe->gt->wa_active.lrc, 892 980 ARRAY_SIZE(lrc_was)); 893 - xe_rtp_process_to_sr(&ctx, lrc_was, ARRAY_SIZE(lrc_was), &hwe->reg_lrc); 981 + xe_rtp_process_to_sr(&ctx, lrc_was, ARRAY_SIZE(lrc_was), 982 + &hwe->reg_lrc, true); 894 983 } 895 984 896 985 /**
+12 -27
drivers/gpu/drm/xe/xe_wa_oob.rules
··· 2 2 16010904313 GRAPHICS_VERSION_RANGE(1200, 1210) 3 3 18022495364 GRAPHICS_VERSION_RANGE(1200, 1210) 4 4 22012773006 GRAPHICS_VERSION_RANGE(1200, 1250) 5 - 14014475959 GRAPHICS_VERSION_RANGE(1270, 1271), GRAPHICS_STEP(A0, B0) 6 - PLATFORM(DG2) 5 + 14014475959 PLATFORM(DG2) 7 6 22011391025 PLATFORM(DG2) 8 7 22012727170 SUBPLATFORM(DG2, G11) 9 8 22012727685 SUBPLATFORM(DG2, G11) 10 9 22016596838 PLATFORM(PVC) 11 10 18020744125 PLATFORM(PVC) 12 - 1509372804 PLATFORM(PVC), GRAPHICS_STEP(A0, C0) 13 11 1409600907 GRAPHICS_VERSION_RANGE(1200, 1250) 14 12 22014953428 SUBPLATFORM(DG2, G10) 15 13 SUBPLATFORM(DG2, G12) ··· 24 26 MEDIA_VERSION(2000) 25 27 16022287689 GRAPHICS_VERSION(2001) 26 28 GRAPHICS_VERSION(2004) 27 - 13011645652 GRAPHICS_VERSION(2004) 28 - GRAPHICS_VERSION_RANGE(3000, 3001) 29 - GRAPHICS_VERSION(3003) 30 - GRAPHICS_VERSION_RANGE(3004, 3005) 31 - 14022293748 GRAPHICS_VERSION_RANGE(2001, 2002) 32 - GRAPHICS_VERSION(2004) 33 - GRAPHICS_VERSION_RANGE(3000, 3005) 34 - 22019794406 GRAPHICS_VERSION_RANGE(2001, 2002) 35 - GRAPHICS_VERSION(2004) 36 - GRAPHICS_VERSION_RANGE(3000, 3001) 37 - GRAPHICS_VERSION(3003) 38 - GRAPHICS_VERSION_RANGE(3004, 3005) 29 + 13011645652 GRAPHICS_VERSION_RANGE(2004, 3005) 30 + 14022293748 GRAPHICS_VERSION_RANGE(2001, 3005) 31 + 22019794406 GRAPHICS_VERSION_RANGE(2001, 3005) 39 32 22019338487 MEDIA_VERSION(2000) 40 33 GRAPHICS_VERSION(2001), FUNC(xe_rtp_match_not_sriov_vf) 41 34 MEDIA_VERSION(3000), MEDIA_STEP(A0, B0), FUNC(xe_rtp_match_not_sriov_vf) ··· 43 54 18013179988 GRAPHICS_VERSION(1255) 44 55 GRAPHICS_VERSION_RANGE(1270, 1274) 45 56 1508761755 GRAPHICS_VERSION(1255) 46 - GRAPHICS_VERSION(1260), GRAPHICS_STEP(A0, B0) 47 - 16023105232 GRAPHICS_VERSION_RANGE(2001, 3001) 48 - MEDIA_VERSION_RANGE(1301, 3000) 49 - MEDIA_VERSION(3002) 50 - GRAPHICS_VERSION_RANGE(3003, 3005) 51 - 16026508708 GRAPHICS_VERSION_RANGE(1200, 3001) 52 - MEDIA_VERSION_RANGE(1300, 3000) 53 - MEDIA_VERSION(3002) 54 - GRAPHICS_VERSION_RANGE(3003, 3005) 57 + 16023105232 GRAPHICS_VERSION_RANGE(2001, 3005) 58 + MEDIA_VERSION_RANGE(1301, 3002) 59 + 16026508708 GRAPHICS_VERSION_RANGE(1200, 3005) 60 + MEDIA_VERSION_RANGE(1300, 3002) 55 61 14020001231 GRAPHICS_VERSION_RANGE(2001,2004), FUNC(xe_rtp_match_psmi_enabled) 56 - MEDIA_VERSION(2000), FUNC(xe_rtp_match_psmi_enabled) 57 - MEDIA_VERSION(3000), FUNC(xe_rtp_match_psmi_enabled) 58 - MEDIA_VERSION(3002), FUNC(xe_rtp_match_psmi_enabled) 62 + MEDIA_VERSION_RANGE(2000, 3002), FUNC(xe_rtp_match_psmi_enabled) 59 63 16023683509 MEDIA_VERSION(2000), FUNC(xe_rtp_match_psmi_enabled) 60 64 MEDIA_VERSION(3000), MEDIA_STEP(A0, B0), FUNC(xe_rtp_match_psmi_enabled) 61 65 62 66 15015404425_disable PLATFORM(PANTHERLAKE), MEDIA_STEP(B0, FOREVER) 63 67 16026007364 MEDIA_VERSION(3000) 64 68 14020316580 MEDIA_VERSION(1301) 69 + 70 + 14025883347 MEDIA_VERSION_RANGE(1301, 3503) 71 + GRAPHICS_VERSION_RANGE(2004, 3005)
+12 -3
drivers/gpu/drm/xe/xe_wopcm.c
··· 55 55 #define MTL_WOPCM_SIZE SZ_4M 56 56 #define WOPCM_SIZE SZ_2M 57 57 58 - #define MAX_WOPCM_SIZE SZ_8M 59 - 60 58 /* 16KB WOPCM (RSVD WOPCM) is reserved from HuC firmware top. */ 61 59 #define WOPCM_RESERVED_SIZE SZ_16K 62 60 ··· 184 186 WOPCM_SIZE; 185 187 } 186 188 189 + static u32 max_wopcm_size(struct xe_device *xe) 190 + { 191 + if (xe->info.platform == XE_NOVALAKE_P) 192 + return SZ_16M; 193 + else 194 + return SZ_8M; 195 + } 196 + 187 197 /** 188 198 * xe_wopcm_init() - Initialize the WOPCM structure. 189 199 * @wopcm: pointer to xe_wopcm. ··· 233 227 * When the GuC wopcm base and size are preprogrammed by 234 228 * BIOS/IFWI, check against the max allowed wopcm size to 235 229 * validate if the programmed values align to the wopcm layout. 230 + * 231 + * FIXME: This is giving the maximum overall WOPCM size and not 232 + * the size relative to each GT. 236 233 */ 237 - wopcm->size = MAX_WOPCM_SIZE; 234 + wopcm->size = max_wopcm_size(xe); 238 235 239 236 goto check; 240 237 }
+6
include/drm/drm_suballoc.h
··· 53 53 54 54 void drm_suballoc_manager_fini(struct drm_suballoc_manager *sa_manager); 55 55 56 + struct drm_suballoc *drm_suballoc_alloc(gfp_t gfp); 57 + 58 + int drm_suballoc_insert(struct drm_suballoc_manager *sa_manager, 59 + struct drm_suballoc *sa, size_t size, bool intr, 60 + size_t align); 61 + 56 62 struct drm_suballoc * 57 63 drm_suballoc_new(struct drm_suballoc_manager *sa_manager, size_t size, 58 64 gfp_t gfp, bool intr, size_t align);
+12
include/drm/intel/pciids.h
··· 900 900 #define INTEL_CRI_IDS(MACRO__, ...) \ 901 901 MACRO__(0x674C, ## __VA_ARGS__) 902 902 903 + /* NVL-P */ 904 + #define INTEL_NVLP_IDS(MACRO__, ...) \ 905 + MACRO__(0xD750, ## __VA_ARGS__), \ 906 + MACRO__(0xD751, ## __VA_ARGS__), \ 907 + MACRO__(0xD752, ## __VA_ARGS__), \ 908 + MACRO__(0xD753, ## __VA_ARGS__), \ 909 + MACRO__(0XD754, ## __VA_ARGS__), \ 910 + MACRO__(0XD755, ## __VA_ARGS__), \ 911 + MACRO__(0XD756, ## __VA_ARGS__), \ 912 + MACRO__(0XD757, ## __VA_ARGS__), \ 913 + MACRO__(0xD75F, ## __VA_ARGS__) 914 + 903 915 #endif /* __PCIIDS_H__ */
+9 -1
include/linux/migrate.h
··· 65 65 66 66 int migrate_huge_page_move_mapping(struct address_space *mapping, 67 67 struct folio *dst, struct folio *src); 68 - void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) 68 + void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) 69 69 __releases(ptl); 70 70 void folio_migrate_flags(struct folio *newfolio, struct folio *folio); 71 71 int folio_migrate_mapping(struct address_space *mapping, ··· 95 95 static inline int set_movable_ops(const struct movable_operations *ops, enum pagetype type) 96 96 { 97 97 return -ENOSYS; 98 + } 99 + 100 + static inline void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) 101 + __releases(ptl) 102 + { 103 + WARN_ON_ONCE(1); 104 + 105 + spin_unlock(ptl); 98 106 } 99 107 100 108 #endif /* CONFIG_MIGRATION */
+7 -7
include/uapi/drm/xe_drm.h
··· 335 335 __u64 total_size; 336 336 /** 337 337 * @used: Estimate of the memory used in bytes for this region. 338 - * 339 - * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable 340 - * accounting. Without this the value here will always equal 341 - * zero. 342 338 */ 343 339 __u64 used; 344 340 /** ··· 359 363 * @cpu_visible_used: Estimate of CPU visible memory used, in 360 364 * bytes. 361 365 * 362 - * Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable 363 - * accounting. Without this the value here will always equal 364 - * zero. Note this is only currently tracked for 366 + * Note this is only currently tracked for 365 367 * DRM_XE_MEM_REGION_CLASS_VRAM regions (for other types the value 366 368 * here will always be zero). 367 369 */ ··· 969 975 * demand when accessed, and also allows per-VM overcommit of memory. 970 976 * The xe driver internally uses recoverable pagefaults to implement 971 977 * this. 978 + * - %DRM_XE_VM_CREATE_FLAG_NO_VM_OVERCOMMIT - Requires also 979 + * DRM_XE_VM_CREATE_FLAG_FAULT_MODE. This disallows per-VM overcommit 980 + * but only during a &DRM_IOCTL_XE_VM_BIND operation with the 981 + * %DRM_XE_VM_BIND_FLAG_IMMEDIATE flag set. This may be useful for 982 + * user-space naively probing the amount of available memory. 972 983 */ 973 984 struct drm_xe_vm_create { 974 985 /** @extensions: Pointer to the first extension struct, if any */ ··· 982 983 #define DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE (1 << 0) 983 984 #define DRM_XE_VM_CREATE_FLAG_LR_MODE (1 << 1) 984 985 #define DRM_XE_VM_CREATE_FLAG_FAULT_MODE (1 << 2) 986 + #define DRM_XE_VM_CREATE_FLAG_NO_VM_OVERCOMMIT (1 << 3) 985 987 /** @flags: Flags */ 986 988 __u32 flags; 987 989
+10 -5
mm/filemap.c
··· 1379 1379 1380 1380 #ifdef CONFIG_MIGRATION 1381 1381 /** 1382 - * migration_entry_wait_on_locked - Wait for a migration entry to be removed 1383 - * @entry: migration swap entry. 1382 + * softleaf_entry_wait_on_locked - Wait for a migration entry or 1383 + * device_private entry to be removed. 1384 + * @entry: migration or device_private swap entry. 1384 1385 * @ptl: already locked ptl. This function will drop the lock. 1385 1386 * 1386 - * Wait for a migration entry referencing the given page to be removed. This is 1387 + * Wait for a migration entry referencing the given page, or device_private 1388 + * entry referencing a dvice_private page to be unlocked. This is 1387 1389 * equivalent to folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE) except 1388 1390 * this can be called without taking a reference on the page. Instead this 1389 - * should be called while holding the ptl for the migration entry referencing 1391 + * should be called while holding the ptl for @entry referencing 1390 1392 * the page. 1391 1393 * 1392 1394 * Returns after unlocking the ptl. ··· 1396 1394 * This follows the same logic as folio_wait_bit_common() so see the comments 1397 1395 * there. 1398 1396 */ 1399 - void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) 1397 + void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) 1400 1398 __releases(ptl) 1401 1399 { 1402 1400 struct wait_page_queue wait_page; ··· 1430 1428 * If a migration entry exists for the page the migration path must hold 1431 1429 * a valid reference to the page, and it must take the ptl to remove the 1432 1430 * migration entry. So the page is valid until the ptl is dropped. 1431 + * Similarly any path attempting to drop the last reference to a 1432 + * device-private page needs to grab the ptl to remove the device-private 1433 + * entry. 1433 1434 */ 1434 1435 spin_unlock(ptl); 1435 1436
+2 -1
mm/memory.c
··· 4763 4763 unlock_page(vmf->page); 4764 4764 put_page(vmf->page); 4765 4765 } else { 4766 - pte_unmap_unlock(vmf->pte, vmf->ptl); 4766 + pte_unmap(vmf->pte); 4767 + softleaf_entry_wait_on_locked(entry, vmf->ptl); 4767 4768 } 4768 4769 } else if (softleaf_is_hwpoison(entry)) { 4769 4770 ret = VM_FAULT_HWPOISON;
+4 -4
mm/migrate.c
··· 500 500 if (!softleaf_is_migration(entry)) 501 501 goto out; 502 502 503 - migration_entry_wait_on_locked(entry, ptl); 503 + softleaf_entry_wait_on_locked(entry, ptl); 504 504 return; 505 505 out: 506 506 spin_unlock(ptl); ··· 532 532 * If migration entry existed, safe to release vma lock 533 533 * here because the pgtable page won't be freed without the 534 534 * pgtable lock released. See comment right above pgtable 535 - * lock release in migration_entry_wait_on_locked(). 535 + * lock release in softleaf_entry_wait_on_locked(). 536 536 */ 537 537 hugetlb_vma_unlock_read(vma); 538 - migration_entry_wait_on_locked(entry, ptl); 538 + softleaf_entry_wait_on_locked(entry, ptl); 539 539 return; 540 540 } 541 541 ··· 553 553 ptl = pmd_lock(mm, pmd); 554 554 if (!pmd_is_migration_entry(*pmd)) 555 555 goto unlock; 556 - migration_entry_wait_on_locked(softleaf_from_pmd(*pmd), ptl); 556 + softleaf_entry_wait_on_locked(softleaf_from_pmd(*pmd), ptl); 557 557 return; 558 558 unlock: 559 559 spin_unlock(ptl);
+1 -1
mm/migrate_device.c
··· 176 176 } 177 177 178 178 if (softleaf_is_migration(entry)) { 179 - migration_entry_wait_on_locked(entry, ptl); 179 + softleaf_entry_wait_on_locked(entry, ptl); 180 180 spin_unlock(ptl); 181 181 return -EAGAIN; 182 182 }