Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'drm-xe-next-2025-12-19' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

[airlied: fix guc submit double definition]
UAPI Changes:
- Multi-Queue support (Niranjana)
- Add DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE (Brost)
- Add NO_COMPRESSION BO flag and query capability (Sanjay)
- Add gt_id to struct drm_xe_oa_unit (Ashutosh)
- Expose MERT OA unit (Ashutosh)
- Sysfs Survivability refactor (Riana)

Cross-subsystem Changes:
- VFIO: Add device specific vfio_pci driver variant for Intel graphics (Winiarski)

Driver Changes:
- MAINTAINERS update (Lucas -> Matt)
- Add helper to query compression enable status (Xin)
- Xe_VM fixes and updates (Shuicheng, Himal)
- Documentation fixes (Winiarski, Swaraj, Niranjana)
- Kunit fix (Roper)
- Fix potential leaks, uaf, null derref, and oversized
allocations (Shuicheng, Sanjay, Mika, Tapani)
- Other minor fixes like kbuild duplication and sysfs_emit (Shuicheng, Madhur)
- Handle msix vector0 interrupt (Venkata)
- Scope-based forcewake and runtime PM (Roper, Raag)
- GuC/HuC related fixes and refactors (Lucas, Zhanjun, Brost, Julia, Wajdeczko)
- Fix conversion from clock ticks to milliseconds (Harish)
- SRIOV PF PF: Add support for MERT (Lukasz)
- Enable SR-IOV VF migration and other SRIOV updates (Winiarski,
Satya, Brost, Wajdeczko, Piotr, Tomasz, Daniele)
- Optimize runtime suspend/resume and other PM improvements (Raag)
- Some W/a additions and updates (Bala, Harish, Roper)
- Use for_each_tlb_inval() to calculate invalidation fences (Roper)
- Fix VFIO link error (Arnd)
- Fix ix drm_gpusvm_init() arguments (Arnd)
- Other OA refactor (Ashutosh)
- Refactor PAT and expose debugfs (Xin)
- Enable Indirect Ring State for xe3p_xpc (Niranjana)
- MEI interrupt fix (Junxiao)
- Add stats for mode switching on hw_engine_group (Francois)
- DMA-Buf related changes (Thomas)
- Multi Queue feature support (Niranjana)
- Enable I2C controller for Crescent Island (Raag)
- Enable NVM for Crescent Island (Sasha)
- Increase TDF timeout (Jagmeet)
- Restore engine registers before restarting schedulers after GT reset (Jan)
- Page Reclamation Support for Xe3p Platforms (Brian, Brost, Oak)
- Fix performance when pagefaults and 3d/display share resources (Brost)
- More OA MERT work (Ashutosh)
- Fix return values (Dan)
- Some log level and messages improvements (Brost)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aUXUhEgzs6hDLQuu@intel.com

+5333 -1502
+1
.mailmap
··· 481 481 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> <lstoakes@gmail.com> 482 482 Luca Ceresoli <luca.ceresoli@bootlin.com> <luca@lucaceresoli.net> 483 483 Luca Weiss <luca@lucaweiss.eu> <luca@z3ntu.xyz> 484 + Lucas De Marchi <demarchi@kernel.org> <lucas.demarchi@intel.com> 484 485 Lukasz Luba <lukasz.luba@arm.com> <l.luba@partner.samsung.com> 485 486 Luo Jie <quic_luoj@quicinc.com> <luoj@codeaurora.org> 486 487 Lance Yang <lance.yang@linux.dev> <ioworker0@gmail.com>
+1 -1
Documentation/ABI/testing/sysfs-driver-intel-xe-sriov
··· 119 119 The GT preemption timeout (PT) in [us] to be applied to all functions. 120 120 See sriov_admin/{pf,vf<N>}/profile/preempt_timeout_us for more details. 121 121 122 - sched_priority: (RW/RO) string 122 + sched_priority: (WO) string 123 123 The GT scheduling priority to be applied for all functions. 124 124 See sriov_admin/{pf,vf<N>}/profile/sched_priority for more details. 125 125
+14
Documentation/gpu/xe/xe_exec_queue.rst
··· 7 7 .. kernel-doc:: drivers/gpu/drm/xe/xe_exec_queue.c 8 8 :doc: Execution Queue 9 9 10 + Multi Queue Group 11 + ================= 12 + 13 + .. kernel-doc:: drivers/gpu/drm/xe/xe_exec_queue.c 14 + :doc: Multi Queue Group 15 + 16 + .. _multi-queue-group-guc-interface: 17 + 18 + Multi Queue Group GuC interface 19 + =============================== 20 + 21 + .. kernel-doc:: drivers/gpu/drm/xe/xe_guc_submit.c 22 + :doc: Multi Queue Group GuC interface 23 + 10 24 Internal API 11 25 ============ 12 26
+1 -1
MAINTAINERS
··· 12640 12640 F: include/uapi/drm/i915_drm.h 12641 12641 12642 12642 INTEL DRM XE DRIVER (Lunar Lake and newer) 12643 - M: Lucas De Marchi <lucas.demarchi@intel.com> 12643 + M: Matthew Brost <matthew.brost@intel.com> 12644 12644 M: Thomas Hellström <thomas.hellstrom@linux.intel.com> 12645 12645 M: Rodrigo Vivi <rodrigo.vivi@intel.com> 12646 12646 L: intel-xe@lists.freedesktop.org
+3
drivers/gpu/drm/drm_gpusvm.c
··· 1288 1288 DMA_BIDIRECTIONAL; 1289 1289 1290 1290 retry: 1291 + if (time_after(jiffies, timeout)) 1292 + return -EBUSY; 1293 + 1291 1294 hmm_range.notifier_seq = mmu_interval_read_begin(notifier); 1292 1295 if (drm_gpusvm_pages_valid_unlocked(gpusvm, svm_pages)) 1293 1296 goto set_seqno;
+2
drivers/gpu/drm/xe/Makefile
··· 95 95 xe_oa.o \ 96 96 xe_observation.o \ 97 97 xe_pagefault.o \ 98 + xe_page_reclaim.o \ 98 99 xe_pat.o \ 99 100 xe_pci.o \ 100 101 xe_pcode.o \ ··· 174 173 xe_lmtt.o \ 175 174 xe_lmtt_2l.o \ 176 175 xe_lmtt_ml.o \ 176 + xe_mert.o \ 177 177 xe_pci_sriov.o \ 178 178 xe_sriov_packet.o \ 179 179 xe_sriov_pf.o \
+6
drivers/gpu/drm/xe/abi/guc_actions_abi.h
··· 139 139 XE_GUC_ACTION_DEREGISTER_G2G = 0x4508, 140 140 XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600, 141 141 XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601, 142 + XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602, 143 + XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603, 144 + XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604, 145 + XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CGP_CONTEXT_ERROR = 0x4605, 142 146 XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507, 143 147 XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A, 144 148 XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C, ··· 155 151 XE_GUC_ACTION_TLB_INVALIDATION = 0x7000, 156 152 XE_GUC_ACTION_TLB_INVALIDATION_DONE = 0x7001, 157 153 XE_GUC_ACTION_TLB_INVALIDATION_ALL = 0x7002, 154 + XE_GUC_ACTION_PAGE_RECLAMATION = 0x7003, 155 + XE_GUC_ACTION_PAGE_RECLAMATION_DONE = 0x7004, 158 156 XE_GUC_ACTION_STATE_CAPTURE_NOTIFICATION = 0x8002, 159 157 XE_GUC_ACTION_NOTIFY_FLUSH_LOG_BUFFER_TO_FILE = 0x8003, 160 158 XE_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED = 0x8004,
+57 -10
drivers/gpu/drm/xe/abi/guc_actions_sriov_abi.h
··· 502 502 #define VF2GUC_VF_RESET_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0 503 503 504 504 /** 505 - * DOC: VF2GUC_NOTIFY_RESFIX_DONE 505 + * DOC: VF2GUC_RESFIX_DONE 506 506 * 507 - * This action is used by VF to notify the GuC that the VF KMD has completed 508 - * post-migration recovery steps. 507 + * This action is used by VF to inform the GuC that the VF KMD has completed 508 + * post-migration recovery steps. From GuC VF compatibility 1.27.0 onwards, it 509 + * shall only be sent after posting RESFIX_START and that both @MARKER fields 510 + * must match. 509 511 * 510 512 * This message must be sent as `MMIO HXG Message`_. 513 + * 514 + * Updated since GuC VF compatibility 1.27.0. 511 515 * 512 516 * +---+-------+--------------------------------------------------------------+ 513 517 * | | Bits | Description | ··· 520 516 * | +-------+--------------------------------------------------------------+ 521 517 * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ | 522 518 * | +-------+--------------------------------------------------------------+ 523 - * | | 27:16 | DATA0 = MBZ | 519 + * | | 27:16 | DATA0 = MARKER = MBZ (only prior 1.27.0) | 524 520 * | +-------+--------------------------------------------------------------+ 525 - * | | 15:0 | ACTION = _`GUC_ACTION_VF2GUC_NOTIFY_RESFIX_DONE` = 0x5508 | 521 + * | | 27:16 | DATA0 = MARKER - can't be zero (1.27.0+) | 522 + * | +-------+--------------------------------------------------------------+ 523 + * | | 15:0 | ACTION = _`GUC_ACTION_VF2GUC_RESFIX_DONE` = 0x5508 | 526 524 * +---+-------+--------------------------------------------------------------+ 527 525 * 528 526 * +---+-------+--------------------------------------------------------------+ ··· 537 531 * | | 27:0 | DATA0 = MBZ | 538 532 * +---+-------+--------------------------------------------------------------+ 539 533 */ 540 - #define GUC_ACTION_VF2GUC_NOTIFY_RESFIX_DONE 0x5508u 534 + #define GUC_ACTION_VF2GUC_RESFIX_DONE 0x5508u 541 535 542 - #define VF2GUC_NOTIFY_RESFIX_DONE_REQUEST_MSG_LEN GUC_HXG_REQUEST_MSG_MIN_LEN 543 - #define VF2GUC_NOTIFY_RESFIX_DONE_REQUEST_MSG_0_MBZ GUC_HXG_REQUEST_MSG_0_DATA0 536 + #define VF2GUC_RESFIX_DONE_REQUEST_MSG_LEN GUC_HXG_REQUEST_MSG_MIN_LEN 537 + #define VF2GUC_RESFIX_DONE_REQUEST_MSG_0_MARKER GUC_HXG_REQUEST_MSG_0_DATA0 544 538 545 - #define VF2GUC_NOTIFY_RESFIX_DONE_RESPONSE_MSG_LEN GUC_HXG_RESPONSE_MSG_MIN_LEN 546 - #define VF2GUC_NOTIFY_RESFIX_DONE_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0 539 + #define VF2GUC_RESFIX_DONE_RESPONSE_MSG_LEN GUC_HXG_RESPONSE_MSG_MIN_LEN 540 + #define VF2GUC_RESFIX_DONE_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0 547 541 548 542 /** 549 543 * DOC: VF2GUC_QUERY_SINGLE_KLV ··· 661 655 662 656 #define PF2GUC_SAVE_RESTORE_VF_RESPONSE_MSG_LEN GUC_HXG_RESPONSE_MSG_MIN_LEN 663 657 #define PF2GUC_SAVE_RESTORE_VF_RESPONSE_MSG_0_USED GUC_HXG_RESPONSE_MSG_0_DATA0 658 + 659 + /** 660 + * DOC: VF2GUC_RESFIX_START 661 + * 662 + * This action is used by VF to inform the GuC that the VF KMD will be starting 663 + * post-migration recovery fixups. The @MARKER sent with this action must match 664 + * with the MARKER posted in the VF2GUC_RESFIX_DONE message. 665 + * 666 + * This message must be sent as `MMIO HXG Message`_. 667 + * 668 + * Available since GuC VF compatibility 1.27.0. 669 + * 670 + * +---+-------+--------------------------------------------------------------+ 671 + * | | Bits | Description | 672 + * +===+=======+==============================================================+ 673 + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_HOST_ | 674 + * | +-------+--------------------------------------------------------------+ 675 + * | | 30:28 | TYPE = GUC_HXG_TYPE_REQUEST_ | 676 + * | +-------+--------------------------------------------------------------+ 677 + * | | 27:16 | DATA0 = MARKER - can't be zero | 678 + * | +-------+--------------------------------------------------------------+ 679 + * | | 15:0 | ACTION = _`GUC_ACTION_VF2GUC_RESFIX_START` = 0x550F | 680 + * +---+-------+--------------------------------------------------------------+ 681 + * 682 + * +---+-------+--------------------------------------------------------------+ 683 + * | | Bits | Description | 684 + * +===+=======+==============================================================+ 685 + * | 0 | 31 | ORIGIN = GUC_HXG_ORIGIN_GUC_ | 686 + * | +-------+--------------------------------------------------------------+ 687 + * | | 30:28 | TYPE = GUC_HXG_TYPE_RESPONSE_SUCCESS_ | 688 + * | +-------+--------------------------------------------------------------+ 689 + * | | 27:0 | DATA0 = MBZ | 690 + * +---+-------+--------------------------------------------------------------+ 691 + */ 692 + #define GUC_ACTION_VF2GUC_RESFIX_START 0x550Fu 693 + 694 + #define VF2GUC_RESFIX_START_REQUEST_MSG_LEN GUC_HXG_REQUEST_MSG_MIN_LEN 695 + #define VF2GUC_RESFIX_START_REQUEST_MSG_0_MARKER GUC_HXG_REQUEST_MSG_0_DATA0 696 + 697 + #define VF2GUC_RESFIX_START_RESPONSE_MSG_LEN GUC_HXG_RESPONSE_MSG_MIN_LEN 698 + #define VF2GUC_RESFIX_START_RESPONSE_MSG_0_MBZ GUC_HXG_RESPONSE_MSG_0_DATA0 664 699 665 700 #endif
+9
drivers/gpu/drm/xe/abi/guc_klvs_abi.h
··· 352 352 * :1: NORMAL = schedule VF always, irrespective of whether it has work or not 353 353 * :2: HIGH = schedule VF in the next time-slice after current active 354 354 * time-slice completes if it has active work 355 + * 356 + * _`GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT` : 0x8A0D 357 + * Given that multi-LRC contexts are incompatible with SRIOV scheduler 358 + * groups and cause the latter to be turned off when registered with the 359 + * GuC, this config allows the PF to set a threshold for multi-LRC context 360 + * registrations by VFs to monitor their behavior. 355 361 */ 356 362 357 363 #define GUC_KLV_VF_CFG_GGTT_START_KEY 0x0001 ··· 415 409 #define GUC_SCHED_PRIORITY_LOW 0u 416 410 #define GUC_SCHED_PRIORITY_NORMAL 1u 417 411 #define GUC_SCHED_PRIORITY_HIGH 2u 412 + 413 + #define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_KEY 0x8a0d 414 + #define GUC_KLV_VF_CFG_THRESHOLD_MULTI_LRC_COUNT_LEN 1u 418 415 419 416 /* 420 417 * Workaround keys:
+171
drivers/gpu/drm/xe/abi/guc_lfd_abi.h
··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2025 Intel Corporation 4 + */ 5 + 6 + #ifndef _ABI_GUC_LFD_ABI_H_ 7 + #define _ABI_GUC_LFD_ABI_H_ 8 + 9 + #include <linux/types.h> 10 + 11 + #include "guc_lic_abi.h" 12 + 13 + /* The current major version of GuC-Log-File format. */ 14 + #define GUC_LFD_FORMAT_VERSION_MAJOR 0x0001 15 + /* The current minor version of GuC-Log-File format. */ 16 + #define GUC_LFD_FORMAT_VERSION_MINOR 0x0000 17 + 18 + /** enum guc_lfd_type - Log format descriptor type */ 19 + enum guc_lfd_type { 20 + /** 21 + * @GUC_LFD_TYPE_FW_REQUIRED_RANGE_START: Start of range for 22 + * required LFDs from GuC 23 + * @GUC_LFD_TYPE_FW_VERSION: GuC Firmware Version structure. 24 + * @GUC_LFD_TYPE_GUC_DEVICE_ID: GuC microcontroller device ID. 25 + * @GUC_LFD_TYPE_TSC_FREQUENCY: Frequency of GuC timestamps. 26 + * @GUC_LFD_TYPE_GMD_ID: HW GMD ID. 27 + * @GUC_LFD_TYPE_BUILD_PLATFORM_ID: GuC build platform ID. 28 + * @GUC_LFD_TYPE_FW_REQUIRED_RANGE_END: End of range for 29 + * required LFDs from GuC 30 + */ 31 + GUC_LFD_TYPE_FW_REQUIRED_RANGE_START = 0x1, 32 + GUC_LFD_TYPE_FW_VERSION = 0x1, 33 + GUC_LFD_TYPE_GUC_DEVICE_ID = 0x2, 34 + GUC_LFD_TYPE_TSC_FREQUENCY = 0x3, 35 + GUC_LFD_TYPE_GMD_ID = 0x4, 36 + GUC_LFD_TYPE_BUILD_PLATFORM_ID = 0x5, 37 + GUC_LFD_TYPE_FW_REQUIRED_RANGE_END = 0x1FFF, 38 + 39 + /** 40 + * @GUC_LFD_TYPE_FW_OPTIONAL_RANGE_START: Start of range for 41 + * optional LFDs from GuC 42 + * @GUC_LFD_TYPE_LOG_EVENTS_BUFFER: Log-event-entries buffer. 43 + * @GUC_LFD_TYPE_FW_CRASH_DUMP: GuC generated crash-dump blob. 44 + * @GUC_LFD_TYPE_FW_OPTIONAL_RANGE_END: End of range for 45 + * optional LFDs from GuC 46 + */ 47 + GUC_LFD_TYPE_FW_OPTIONAL_RANGE_START = 0x2000, 48 + GUC_LFD_TYPE_LOG_EVENTS_BUFFER = 0x2000, 49 + GUC_LFD_TYPE_FW_CRASH_DUMP = 0x2001, 50 + GUC_LFD_TYPE_FW_OPTIONAL_RANGE_END = 0x3FFF, 51 + 52 + /** 53 + * @GUC_LFD_TYPE_KMD_REQUIRED_RANGE_START: Start of range for 54 + * required KMD LFDs 55 + * @GUC_LFD_TYPE_OS_ID: An identifier for the OS. 56 + * @GUC_LFD_TYPE_KMD_REQUIRED_RANGE_END: End of this range for 57 + * required KMD LFDs 58 + */ 59 + GUC_LFD_TYPE_KMD_REQUIRED_RANGE_START = 0x4000, 60 + GUC_LFD_TYPE_OS_ID = 0x4000, 61 + GUC_LFD_TYPE_KMD_REQUIRED_RANGE_END = 0x5FFF, 62 + 63 + /** 64 + * @GUC_LFD_TYPE_KMD_OPTIONAL_RANGE_START: Start of range for 65 + * optional KMD LFDs 66 + * @GUC_LFD_TYPE_BINARY_SCHEMA_FORMAT: Binary representation of 67 + * GuC log-events schema. 68 + * @GUC_LFD_TYPE_HOST_COMMENT: ASCII string containing comments 69 + * from the host/KMD. 70 + * @GUC_LFD_TYPE_TIMESTAMP_ANCHOR: A timestamp anchor, to convert 71 + * between host and GuC timestamp. 72 + * @GUC_LFD_TYPE_TIMESTAMP_ANCHOR_CONFIG: Timestamp anchor 73 + * configuration, definition of timestamp frequency and bit width. 74 + * @GUC_LFD_TYPE_KMD_OPTIONAL_RANGE_END: End of this range for 75 + * optional KMD LFDs 76 + */ 77 + GUC_LFD_TYPE_KMD_OPTIONAL_RANGE_START = 0x6000, 78 + GUC_LFD_TYPE_BINARY_SCHEMA_FORMAT = 0x6000, 79 + GUC_LFD_TYPE_HOST_COMMENT = 0x6001, 80 + GUC_LFD_TYPE_TIMESTAMP_ANCHOR = 0x6002, 81 + GUC_LFD_TYPE_TIMESTAMP_ANCHOR_CONFIG = 0x6003, 82 + GUC_LFD_TYPE_KMD_OPTIONAL_RANGE_END = 0x7FFF, 83 + 84 + /* 85 + * @GUC_LFD_TYPE_RESERVED_RANGE_START: Start of reserved range 86 + * @GUC_LFD_TYPE_RESERVED_RANGE_END: End of reserved range 87 + */ 88 + GUC_LFD_TYPE_RESERVED_RANGE_START = 0x8000, 89 + GUC_LFD_TYPE_RESERVED_RANGE_END = 0xFFFF, 90 + }; 91 + 92 + /** enum guc_lfd_os_type - OS Type LFD-ID */ 93 + enum guc_lfd_os_type { 94 + /** @GUC_LFD_OS_TYPE_OSID_WIN: Windows OS */ 95 + GUC_LFD_OS_TYPE_OSID_WIN = 0x1, 96 + /** @GUC_LFD_OS_TYPE_OSID_LIN: Linux OS */ 97 + GUC_LFD_OS_TYPE_OSID_LIN = 0x2, 98 + /** @GUC_LFD_OS_TYPE_OSID_VMW: VMWare OS */ 99 + GUC_LFD_OS_TYPE_OSID_VMW = 0x3, 100 + /** @GUC_LFD_OS_TYPE_OSID_OTHER: Other */ 101 + GUC_LFD_OS_TYPE_OSID_OTHER = 0x4, 102 + }; 103 + 104 + /** struct guc_lfd_data - A generic header structure for all LFD blocks */ 105 + struct guc_lfd_data { 106 + /** @header: A 32 bits dword, contains multiple bit fields */ 107 + u32 header; 108 + /* LFD type. See guc_lfd_type */ 109 + #define GUC_LFD_DATA_HEADER_MASK_TYPE GENMASK(31, 16) 110 + #define GUC_LFD_DATA_HEADER_MASK_MAGIC GENMASK(15, 0) 111 + 112 + /** @data_count: Number of dwords the `data` field contains. */ 113 + u32 data_count; 114 + /** @data: Data defined by GUC_LFD_DATA_HEADER_MASK_TYPE */ 115 + u32 data[] __counted_by(data_count); 116 + } __packed; 117 + 118 + /** 119 + * struct guc_lfd_data_log_events_buf - GuC Log Events Buffer. 120 + * This is optional fw LFD data 121 + */ 122 + struct guc_lfd_data_log_events_buf { 123 + /** 124 + * @log_events_format_version: version of GuC log format of buffer 125 + */ 126 + u32 log_events_format_version; 127 + /** 128 + * @log_event: The log event data. 129 + * Size in dwords is LFD block size - 1. 130 + */ 131 + u32 log_event[]; 132 + } __packed; 133 + 134 + /** struct guc_lfd_data_os_info - OS Version Information. */ 135 + struct guc_lfd_data_os_info { 136 + /** 137 + * @os_id: enum values to identify the OS brand. 138 + * See guc_lfd_os_type for the range of types 139 + */ 140 + u32 os_id; 141 + /** 142 + * @build_version: ASCII string containing OS build version 143 + * information based on os_id. String is padded with null 144 + * characters to ensure its DWORD aligned. 145 + * Size in dwords is LFD block size - 1. 146 + */ 147 + char build_version[]; 148 + } __packed; 149 + 150 + /** 151 + * struct guc_logfile_header - Header of GuC Log Streaming-LFD-File Format. 152 + * This structure encapsulates the layout of the guc-log-file format 153 + */ 154 + struct guc_lfd_file_header { 155 + /** 156 + * @magic: A magic number set by producer of a GuC log file to 157 + * identify that file is a valid guc-log-file containing a stream 158 + * of LFDs. 159 + */ 160 + u64 magic; 161 + /** @version: Version of this file format layout */ 162 + u32 version; 163 + #define GUC_LFD_FILE_HEADER_VERSION_MASK_MAJOR GENMASK(31, 16) 164 + #define GUC_LFD_FILE_HEADER_VERSION_MASK_MINOR GENMASK(15, 0) 165 + 166 + /** @stream: A stream of one or more guc_lfd_data LFD blocks 167 + */ 168 + u32 stream[]; 169 + } __packed; 170 + 171 + #endif
+77
drivers/gpu/drm/xe/abi/guc_lic_abi.h
··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2025 Intel Corporation 4 + */ 5 + 6 + #ifndef _ABI_GUC_LIC_ABI_H_ 7 + #define _ABI_GUC_LIC_ABI_H_ 8 + 9 + #include <linux/types.h> 10 + 11 + /** 12 + * enum guc_lic_type - Log Init Config KLV IDs. 13 + */ 14 + enum guc_lic_type { 15 + /** 16 + * @GUC_LIC_TYPE_GUC_SW_VERSION: GuC firmware version. Value 17 + * is a 32 bit number represented by guc_sw_version. 18 + */ 19 + GUC_LIC_TYPE_GUC_SW_VERSION = 0x1, 20 + /** 21 + * @GUC_LIC_TYPE_GUC_DEVICE_ID: GuC device id. Value is a 32 22 + * bit. 23 + */ 24 + GUC_LIC_TYPE_GUC_DEVICE_ID = 0x2, 25 + /** 26 + * @GUC_LIC_TYPE_TSC_FREQUENCY: GuC timestamp counter 27 + * frequency. Value is a 32 bit number representing frequency in 28 + * kHz. This timestamp is utilized in log entries, timer and 29 + * for engine utilization tracking. 30 + */ 31 + GUC_LIC_TYPE_TSC_FREQUENCY = 0x3, 32 + /** 33 + * @GUC_LIC_TYPE_GMD_ID: HW GMD ID. Value is a 32 bit number 34 + * representing graphics, media and display HW architecture IDs. 35 + */ 36 + GUC_LIC_TYPE_GMD_ID = 0x4, 37 + /** 38 + * @GUC_LIC_TYPE_BUILD_PLATFORM_ID: GuC build platform ID. 39 + * Value is 32 bits. 40 + */ 41 + GUC_LIC_TYPE_BUILD_PLATFORM_ID = 0x5, 42 + }; 43 + 44 + /** 45 + * struct guc_lic - GuC LIC (Log-Init-Config) structure. 46 + * 47 + * This is populated by the GUC at log init time and is located in the log 48 + * buffer memory allocation. 49 + */ 50 + struct guc_lic { 51 + /** 52 + * @magic: A magic number set by GuC to identify that this 53 + * structure contains valid information: magic = GUC_LIC_MAGIC. 54 + */ 55 + u32 magic; 56 + #define GUC_LIC_MAGIC 0x8086900D 57 + /** 58 + * @version: The version of the this structure. 59 + * Major and minor version number are represented as bit fields. 60 + */ 61 + u32 version; 62 + #define GUC_LIC_VERSION_MASK_MAJOR GENMASK(31, 16) 63 + #define GUC_LIC_VERSION_MASK_MINOR GENMASK(15, 0) 64 + 65 + #define GUC_LIC_VERSION_MAJOR 1u 66 + #define GUC_LIC_VERSION_MINOR 0u 67 + 68 + /** @data_count: Number of dwords the `data` array contains. */ 69 + u32 data_count; 70 + /** 71 + * @data: Array of dwords representing a list of LIC KLVs of 72 + * type guc_klv_generic with keys represented by guc_lic_type 73 + */ 74 + u32 data[] __counted_by(data_count); 75 + } __packed; 76 + 77 + #endif
+38 -4
drivers/gpu/drm/xe/abi/guc_log_abi.h
··· 8 8 9 9 #include <linux/types.h> 10 10 11 + /** 12 + * DOC: GuC Log buffer Layout 13 + * 14 + * The in-memory log buffer layout is as follows:: 15 + * 16 + * +===============================+ 0000h 17 + * | Crash dump state header | ^ 18 + * +-------------------------------+ 32B | 19 + * | Debug state header | | 20 + * +-------------------------------+ 64B 4KB 21 + * | Capture state header | | 22 + * +-------------------------------+ 96B | 23 + * | | v 24 + * +===============================+ <--- EVENT_DATA_OFFSET 25 + * | Event logs(raw data) | ^ 26 + * | | | 27 + * | | EVENT_DATA_BUFFER_SIZE 28 + * | | | 29 + * | | v 30 + * +===============================+ <--- CRASH_DUMP_OFFSET 31 + * | Crash Dump(raw data) | ^ 32 + * | | | 33 + * | | CRASH_DUMP_BUFFER_SIZE 34 + * | | | 35 + * | | v 36 + * +===============================+ <--- STATE_CAPTURE_OFFSET 37 + * | Error state capture(raw data) | ^ 38 + * | | | 39 + * | | STATE_CAPTURE_BUFFER_SIZE 40 + * | | | 41 + * | | v 42 + * +===============================+ Total: GUC_LOG_SIZE 43 + */ 44 + 11 45 /* GuC logging buffer types */ 12 - enum guc_log_buffer_type { 13 - GUC_LOG_BUFFER_CRASH_DUMP, 14 - GUC_LOG_BUFFER_DEBUG, 15 - GUC_LOG_BUFFER_CAPTURE, 46 + enum guc_log_type { 47 + GUC_LOG_TYPE_EVENT_DATA, 48 + GUC_LOG_TYPE_CRASH_DUMP, 49 + GUC_LOG_TYPE_STATE_CAPTURE, 16 50 }; 17 51 18 52 #define GUC_LOG_BUFFER_TYPE_MAX 3
+9 -14
drivers/gpu/drm/xe/display/xe_fb_pin.c
··· 210 210 /* TODO: Consider sharing framebuffer mapping? 211 211 * embed i915_vma inside intel_framebuffer 212 212 */ 213 - xe_pm_runtime_get_noresume(xe); 214 - ret = mutex_lock_interruptible(&ggtt->lock); 213 + guard(xe_pm_runtime_noresume)(xe); 214 + ACQUIRE(mutex_intr, lock)(&ggtt->lock); 215 + ret = ACQUIRE_ERR(mutex_intr, &lock); 215 216 if (ret) 216 - goto out; 217 + return ret; 217 218 218 219 align = XE_PAGE_SIZE; 219 220 if (xe_bo_is_vram(bo) && ggtt->flags & XE_GGTT_FLAGS_64K) ··· 224 223 vma->node = bo->ggtt_node[tile0->id]; 225 224 } else if (view->type == I915_GTT_VIEW_NORMAL) { 226 225 vma->node = xe_ggtt_node_init(ggtt); 227 - if (IS_ERR(vma->node)) { 228 - ret = PTR_ERR(vma->node); 229 - goto out_unlock; 230 - } 226 + if (IS_ERR(vma->node)) 227 + return PTR_ERR(vma->node); 231 228 232 229 ret = xe_ggtt_node_insert_locked(vma->node, xe_bo_size(bo), align, 0); 233 230 if (ret) { 234 231 xe_ggtt_node_fini(vma->node); 235 - goto out_unlock; 232 + return ret; 236 233 } 237 234 238 235 xe_ggtt_map_bo(ggtt, vma->node, bo, xe->pat.idx[XE_CACHE_NONE]); ··· 244 245 vma->node = xe_ggtt_node_init(ggtt); 245 246 if (IS_ERR(vma->node)) { 246 247 ret = PTR_ERR(vma->node); 247 - goto out_unlock; 248 + return ret; 248 249 } 249 250 250 251 ret = xe_ggtt_node_insert_locked(vma->node, size, align, 0); 251 252 if (ret) { 252 253 xe_ggtt_node_fini(vma->node); 253 - goto out_unlock; 254 + return ret; 254 255 } 255 256 256 257 ggtt_ofs = vma->node->base.start; ··· 264 265 rot_info->plane[i].dst_stride); 265 266 } 266 267 267 - out_unlock: 268 - mutex_unlock(&ggtt->lock); 269 - out: 270 - xe_pm_runtime_put(xe); 271 268 return ret; 272 269 } 273 270
+9 -22
drivers/gpu/drm/xe/display/xe_hdcp_gsc.c
··· 38 38 struct xe_tile *tile = xe_device_get_root_tile(xe); 39 39 struct xe_gt *gt = tile->media_gt; 40 40 struct xe_gsc *gsc = &gt->uc.gsc; 41 - bool ret = true; 42 - unsigned int fw_ref; 43 41 44 42 if (!gsc || !xe_uc_fw_is_enabled(&gsc->fw)) { 45 43 drm_dbg_kms(&xe->drm, ··· 45 47 return false; 46 48 } 47 49 48 - xe_pm_runtime_get(xe); 49 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GSC); 50 - if (!fw_ref) { 50 + guard(xe_pm_runtime)(xe); 51 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GSC); 52 + if (!fw_ref.domains) { 51 53 drm_dbg_kms(&xe->drm, 52 54 "failed to get forcewake to check proxy status\n"); 53 - ret = false; 54 - goto out; 55 + return false; 55 56 } 56 57 57 - if (!xe_gsc_proxy_init_done(gsc)) 58 - ret = false; 59 - 60 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 61 - out: 62 - xe_pm_runtime_put(xe); 63 - return ret; 58 + return xe_gsc_proxy_init_done(gsc); 64 59 } 65 60 66 61 /*This function helps allocate memory for the command that we will send to gsc cs */ ··· 159 168 u32 addr_out_off, addr_in_wr_off = 0; 160 169 int ret, tries = 0; 161 170 162 - if (msg_in_len > max_msg_size || msg_out_len > max_msg_size) { 163 - ret = -ENOSPC; 164 - goto out; 165 - } 171 + if (msg_in_len > max_msg_size || msg_out_len > max_msg_size) 172 + return -ENOSPC; 166 173 167 174 msg_size_in = msg_in_len + HDCP_GSC_HEADER_SIZE; 168 175 msg_size_out = msg_out_len + HDCP_GSC_HEADER_SIZE; 169 176 addr_out_off = PAGE_SIZE; 170 177 171 178 host_session_id = xe_gsc_create_host_session_id(); 172 - xe_pm_runtime_get_noresume(xe); 179 + guard(xe_pm_runtime_noresume)(xe); 173 180 addr_in_wr_off = xe_gsc_emit_header(xe, &gsc_context->hdcp_bo->vmap, 174 181 addr_in_wr_off, HECI_MEADDRESS_HDCP, 175 182 host_session_id, msg_in_len); ··· 192 203 } while (++tries < 20); 193 204 194 205 if (ret) 195 - goto out; 206 + return ret; 196 207 197 208 xe_map_memcpy_from(xe, msg_out, &gsc_context->hdcp_bo->vmap, 198 209 addr_out_off + HDCP_GSC_HEADER_SIZE, 199 210 msg_out_len); 200 211 201 - out: 202 - xe_pm_runtime_put(xe); 203 212 return ret; 204 213 } 205 214
+1
drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
··· 47 47 48 48 #define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2)) 49 49 50 + #define PIPE_CONTROL0_QUEUE_DRAIN_MODE BIT(12) 50 51 #define PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE BIT(10) /* gen12 */ 51 52 #define PIPE_CONTROL0_HDC_PIPELINE_FLUSH BIT(9) /* gen12 */ 52 53
+3
drivers/gpu/drm/xe/regs/xe_gt_regs.h
··· 227 227 228 228 #define MIRROR_FUSE1 XE_REG(0x911c) 229 229 230 + #define FUSE2 XE_REG(0x9120) 231 + #define PRODUCTION_HW REG_BIT(2) 232 + 230 233 #define MIRROR_L3BANK_ENABLE XE_REG(0x9130) 231 234 #define XE3_L3BANK_ENABLE REG_GENMASK(31, 0) 232 235
+1
drivers/gpu/drm/xe/regs/xe_gtt_defs.h
··· 9 9 #define XELPG_GGTT_PTE_PAT0 BIT_ULL(52) 10 10 #define XELPG_GGTT_PTE_PAT1 BIT_ULL(53) 11 11 12 + #define XE_PTE_ADDR_MASK GENMASK_ULL(51, 12) 12 13 #define GGTT_PTE_VFID GENMASK_ULL(11, 2) 13 14 14 15 #define GUC_GGTT_TOP 0xFEE00000
+3
drivers/gpu/drm/xe/regs/xe_guc_regs.h
··· 90 90 #define GUC_SEND_INTERRUPT XE_REG(0xc4c8) 91 91 #define GUC_SEND_TRIGGER REG_BIT(0) 92 92 93 + #define GUC_INTR_CHICKEN XE_REG(0xc50c) 94 + #define DISABLE_SIGNALING_ENGINES REG_BIT(1) 95 + 93 96 #define GUC_BCS_RCS_IER XE_REG(0xc550) 94 97 #define GUC_VCS2_VCS1_IER XE_REG(0xc554) 95 98 #define GUC_WD_VECS_IER XE_REG(0xc558)
+1
drivers/gpu/drm/xe/regs/xe_irq_regs.h
··· 20 20 #define GU_MISC_IRQ REG_BIT(29) 21 21 #define ERROR_IRQ(x) REG_BIT(26 + (x)) 22 22 #define DISPLAY_IRQ REG_BIT(16) 23 + #define SOC_H2DMEMINT_IRQ REG_BIT(13) 23 24 #define I2C_IRQ REG_BIT(12) 24 25 #define GT_DW_IRQ(x) REG_BIT(x) 25 26
+21
drivers/gpu/drm/xe/regs/xe_mert_regs.h
··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2025 Intel Corporation 4 + */ 5 + 6 + #ifndef _XE_MERT_REGS_H_ 7 + #define _XE_MERT_REGS_H_ 8 + 9 + #include "regs/xe_reg_defs.h" 10 + 11 + #define MERT_LMEM_CFG XE_REG(0x1448b0) 12 + 13 + #define MERT_TLB_CT_INTR_ERR_ID_PORT XE_REG(0x145190) 14 + #define MERT_TLB_CT_VFID_MASK REG_GENMASK(16, 9) 15 + #define MERT_TLB_CT_ERROR_MASK REG_GENMASK(5, 0) 16 + #define MERT_TLB_CT_LMTT_FAULT 0x05 17 + 18 + #define MERT_TLB_INV_DESC_A XE_REG(0x14cf7c) 19 + #define MERT_TLB_INV_DESC_A_VALID REG_BIT(0) 20 + 21 + #endif /* _XE_MERT_REGS_H_ */
+17
drivers/gpu/drm/xe/regs/xe_oa_regs.h
··· 100 100 #define OAM_COMPRESSION_T3_CONTROL XE_REG(0x1c2e00) 101 101 #define OAM_LAT_MEASURE_ENABLE REG_BIT(4) 102 102 103 + /* Actual address is MEDIA_GT_GSI_OFFSET + the base addr below */ 104 + #define XE_OAM_SAG_BASE 0x13000 105 + #define XE_OAM_SCMI_0_BASE 0x14000 106 + #define XE_OAM_SCMI_1_BASE 0x14800 107 + #define XE_OAM_SAG_BASE_ADJ (MEDIA_GT_GSI_OFFSET + XE_OAM_SAG_BASE) 108 + #define XE_OAM_SCMI_0_BASE_ADJ (MEDIA_GT_GSI_OFFSET + XE_OAM_SCMI_0_BASE) 109 + #define XE_OAM_SCMI_1_BASE_ADJ (MEDIA_GT_GSI_OFFSET + XE_OAM_SCMI_1_BASE) 110 + 111 + #define OAMERT_CONTROL XE_REG(0x1453a0) 112 + #define OAMERT_DEBUG XE_REG(0x1453a4) 113 + #define OAMERT_STATUS XE_REG(0x1453a8) 114 + #define OAMERT_HEAD_POINTER XE_REG(0x1453ac) 115 + #define OAMERT_TAIL_POINTER XE_REG(0x1453b0) 116 + #define OAMERT_BUFFER XE_REG(0x1453b4) 117 + #define OAMERT_CONTEXT_CONTROL XE_REG(0x1453c8) 118 + #define OAMERT_MMIO_TRG XE_REG(0x1453cc) 119 + 103 120 #endif
+54
drivers/gpu/drm/xe/tests/xe_args_test.c
··· 78 78 #undef buz 79 79 } 80 80 81 + static void if_args_example(struct kunit *test) 82 + { 83 + enum { Z = 1, Q }; 84 + 85 + #define foo X, Y 86 + #define bar IF_ARGS(Z, Q, foo) 87 + #define buz IF_ARGS(Z, Q, DROP_FIRST_ARG(FIRST_ARG(foo))) 88 + 89 + KUNIT_EXPECT_EQ(test, bar, Z); 90 + KUNIT_EXPECT_EQ(test, buz, Q); 91 + KUNIT_EXPECT_STREQ(test, __stringify(bar), "Z"); 92 + KUNIT_EXPECT_STREQ(test, __stringify(buz), "Q"); 93 + 94 + #undef foo 95 + #undef bar 96 + #undef buz 97 + } 98 + 81 99 static void sep_comma_example(struct kunit *test) 82 100 { 83 101 #define foo(f) f(X) f(Y) f(Z) f(Q) ··· 216 198 KUNIT_EXPECT_STREQ(test, __stringify(LAST_ARG(MAX_ARGS)), "-12"); 217 199 } 218 200 201 + static void if_args_test(struct kunit *test) 202 + { 203 + bool with_args = true; 204 + bool no_args = false; 205 + enum { X = 100 }; 206 + 207 + KUNIT_EXPECT_TRUE(test, IF_ARGS(true, false, FOO_ARGS)); 208 + KUNIT_EXPECT_FALSE(test, IF_ARGS(true, false, NO_ARGS)); 209 + 210 + KUNIT_EXPECT_TRUE(test, CONCATENATE(IF_ARGS(with, no, FOO_ARGS), _args)); 211 + KUNIT_EXPECT_FALSE(test, CONCATENATE(IF_ARGS(with, no, NO_ARGS), _args)); 212 + 213 + KUNIT_EXPECT_STREQ(test, __stringify(IF_ARGS(yes, no, FOO_ARGS)), "yes"); 214 + KUNIT_EXPECT_STREQ(test, __stringify(IF_ARGS(yes, no, NO_ARGS)), "no"); 215 + 216 + KUNIT_EXPECT_EQ(test, IF_ARGS(CALL_ARGS(COUNT_ARGS, FOO_ARGS), -1, FOO_ARGS), 4); 217 + KUNIT_EXPECT_EQ(test, IF_ARGS(CALL_ARGS(COUNT_ARGS, FOO_ARGS), -1, NO_ARGS), -1); 218 + KUNIT_EXPECT_EQ(test, IF_ARGS(CALL_ARGS(COUNT_ARGS, NO_ARGS), -1, FOO_ARGS), 0); 219 + KUNIT_EXPECT_EQ(test, IF_ARGS(CALL_ARGS(COUNT_ARGS, NO_ARGS), -1, NO_ARGS), -1); 220 + 221 + KUNIT_EXPECT_EQ(test, 222 + CALL_ARGS(FIRST_ARG, 223 + CALL_ARGS(CONCATENATE, IF_ARGS(FOO, MAX, FOO_ARGS), _ARGS)), X); 224 + KUNIT_EXPECT_EQ(test, 225 + CALL_ARGS(FIRST_ARG, 226 + CALL_ARGS(CONCATENATE, IF_ARGS(FOO, MAX, NO_ARGS), _ARGS)), -1); 227 + KUNIT_EXPECT_EQ(test, 228 + CALL_ARGS(COUNT_ARGS, 229 + CALL_ARGS(CONCATENATE, IF_ARGS(FOO, MAX, FOO_ARGS), _ARGS)), 4); 230 + KUNIT_EXPECT_EQ(test, 231 + CALL_ARGS(COUNT_ARGS, 232 + CALL_ARGS(CONCATENATE, IF_ARGS(FOO, MAX, NO_ARGS), _ARGS)), 12); 233 + } 234 + 219 235 static struct kunit_case args_tests[] = { 220 236 KUNIT_CASE(count_args_test), 221 237 KUNIT_CASE(call_args_example), ··· 261 209 KUNIT_CASE(last_arg_example), 262 210 KUNIT_CASE(last_arg_test), 263 211 KUNIT_CASE(pick_arg_example), 212 + KUNIT_CASE(if_args_example), 213 + KUNIT_CASE(if_args_test), 264 214 KUNIT_CASE(sep_comma_example), 265 215 {} 266 216 };
+2 -8
drivers/gpu/drm/xe/tests/xe_bo.c
··· 185 185 return 0; 186 186 } 187 187 188 - xe_pm_runtime_get(xe); 189 - 188 + guard(xe_pm_runtime)(xe); 190 189 for_each_tile(tile, xe, id) { 191 190 /* For igfx run only for primary tile */ 192 191 if (!IS_DGFX(xe) && id > 0) 193 192 continue; 194 193 ccs_test_run_tile(xe, tile, test); 195 194 } 196 - 197 - xe_pm_runtime_put(xe); 198 195 199 196 return 0; 200 197 } ··· 353 356 return 0; 354 357 } 355 358 356 - xe_pm_runtime_get(xe); 357 - 359 + guard(xe_pm_runtime)(xe); 358 360 for_each_tile(tile, xe, id) 359 361 evict_test_run_tile(xe, tile, test); 360 - 361 - xe_pm_runtime_put(xe); 362 362 363 363 return 0; 364 364 }
+1 -2
drivers/gpu/drm/xe/tests/xe_dma_buf.c
··· 266 266 const struct dma_buf_test_params *params; 267 267 struct kunit *test = kunit_get_current_test(); 268 268 269 - xe_pm_runtime_get(xe); 269 + guard(xe_pm_runtime)(xe); 270 270 for (params = test_params; params->mem_mask; ++params) { 271 271 struct dma_buf_test_params p = *params; 272 272 ··· 274 274 test->priv = &p; 275 275 xe_test_dmabuf_import_same_driver(xe); 276 276 } 277 - xe_pm_runtime_put(xe); 278 277 279 278 /* A non-zero return would halt iteration over driver devices */ 280 279 return 0;
+2 -8
drivers/gpu/drm/xe/tests/xe_migrate.c
··· 344 344 struct xe_tile *tile; 345 345 int id; 346 346 347 - xe_pm_runtime_get(xe); 348 - 347 + guard(xe_pm_runtime)(xe); 349 348 for_each_tile(tile, xe, id) { 350 349 struct xe_migrate *m = tile->migrate; 351 350 struct drm_exec *exec = XE_VALIDATION_OPT_OUT; ··· 354 355 xe_migrate_sanity_test(m, test, exec); 355 356 xe_vm_unlock(m->q->vm); 356 357 } 357 - 358 - xe_pm_runtime_put(xe); 359 358 360 359 return 0; 361 360 } ··· 756 759 return 0; 757 760 } 758 761 759 - xe_pm_runtime_get(xe); 760 - 762 + guard(xe_pm_runtime)(xe); 761 763 for_each_tile(tile, xe, id) 762 764 validate_ccs_test_run_tile(xe, tile, test); 763 - 764 - xe_pm_runtime_put(xe); 765 765 766 766 return 0; 767 767 }
+8 -19
drivers/gpu/drm/xe/tests/xe_mocs.c
··· 43 43 { 44 44 struct kunit *test = kunit_get_current_test(); 45 45 u32 l3cc, l3cc_expected; 46 - unsigned int fw_ref, i; 46 + unsigned int i; 47 47 u32 reg_val; 48 48 49 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); 50 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) { 51 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 49 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL); 50 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)) 52 51 KUNIT_FAIL_AND_ABORT(test, "Forcewake Failed.\n"); 53 - } 54 52 55 53 for (i = 0; i < info->num_mocs_regs; i++) { 56 54 if (!(i & 1)) { ··· 72 74 KUNIT_EXPECT_EQ_MSG(test, l3cc_expected, l3cc, 73 75 "l3cc idx=%u has incorrect val.\n", i); 74 76 } 75 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 76 77 } 77 78 78 79 static void read_mocs_table(struct xe_gt *gt, ··· 79 82 { 80 83 struct kunit *test = kunit_get_current_test(); 81 84 u32 mocs, mocs_expected; 82 - unsigned int fw_ref, i; 85 + unsigned int i; 83 86 u32 reg_val; 84 87 85 88 KUNIT_EXPECT_TRUE_MSG(test, info->unused_entries_index, 86 89 "Unused entries index should have been defined\n"); 87 90 88 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 89 - KUNIT_ASSERT_NE_MSG(test, fw_ref, 0, "Forcewake Failed.\n"); 91 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 92 + KUNIT_ASSERT_NE_MSG(test, fw_ref.domains, 0, "Forcewake Failed.\n"); 90 93 91 94 for (i = 0; i < info->num_mocs_regs; i++) { 92 95 if (regs_are_mcr(gt)) ··· 103 106 KUNIT_EXPECT_EQ_MSG(test, mocs_expected, mocs, 104 107 "mocs reg 0x%x has incorrect val.\n", i); 105 108 } 106 - 107 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 108 109 } 109 110 110 111 static int mocs_kernel_test_run_device(struct xe_device *xe) ··· 115 120 unsigned int flags; 116 121 int id; 117 122 118 - xe_pm_runtime_get(xe); 119 - 123 + guard(xe_pm_runtime)(xe); 120 124 for_each_gt(gt, xe, id) { 121 125 flags = live_mocs_init(&mocs, gt); 122 126 if (flags & HAS_GLOBAL_MOCS) ··· 123 129 if (flags & HAS_LNCF_MOCS) 124 130 read_l3cc_table(gt, &mocs.table); 125 131 } 126 - 127 - xe_pm_runtime_put(xe); 128 132 129 133 return 0; 130 134 } ··· 147 155 int id; 148 156 struct kunit *test = kunit_get_current_test(); 149 157 150 - xe_pm_runtime_get(xe); 151 - 158 + guard(xe_pm_runtime)(xe); 152 159 for_each_gt(gt, xe, id) { 153 160 flags = live_mocs_init(&mocs, gt); 154 161 kunit_info(test, "mocs_reset_test before reset\n"); ··· 164 173 if (flags & HAS_LNCF_MOCS) 165 174 read_l3cc_table(gt, &mocs.table); 166 175 } 167 - 168 - xe_pm_runtime_put(xe); 169 176 170 177 return 0; 171 178 }
+27
drivers/gpu/drm/xe/xe_args.h
··· 122 122 #define PICK_ARG12(args...) PICK_ARG11(DROP_FIRST_ARG(args)) 123 123 124 124 /** 125 + * IF_ARGS() - Make selection based on optional argument list. 126 + * @then: token to return if arguments are present 127 + * @else: token to return if arguments are empty 128 + * @...: arguments to check (optional) 129 + * 130 + * This macro allows to select a token based on the presence of the argument list. 131 + * 132 + * Example: 133 + * 134 + * #define foo X, Y 135 + * #define bar IF_ARGS(Z, Q, foo) 136 + * #define buz IF_ARGS(Z, Q, DROP_FIRST_ARG(FIRST_ARG(foo))) 137 + * 138 + * With above definitions bar expands to Z while buz expands to Q. 139 + */ 140 + #if defined(CONFIG_CC_IS_CLANG) || GCC_VERSION >= 100100 141 + #define IF_ARGS(then, else, ...) FIRST_ARG(__VA_OPT__(then,) else) 142 + #else 143 + #define IF_ARGS(then, else, ...) _IF_ARGS(then, else, CALL_ARGS(FIRST_ARG, __VA_ARGS__)) 144 + #define _IF_ARGS(then, else, ...) __IF_ARGS(then, else, CALL_ARGS(COUNT_ARGS, __VA_ARGS__)) 145 + #define __IF_ARGS(then, else, n) ___IF_ARGS(then, else, CALL_ARGS(CONCATENATE, ___IF_ARG, n)) 146 + #define ___IF_ARGS(then, else, if) CALL_ARGS(if, then, else) 147 + #define ___IF_ARG1(then, else) then 148 + #define ___IF_ARG0(then, else) else 149 + #endif 150 + 151 + /** 125 152 * ARGS_SEP_COMMA - Definition of a comma character. 126 153 * 127 154 * This definition can be used in cases where any intermediate macro expects
+16 -10
drivers/gpu/drm/xe/xe_bo.c
··· 516 516 * non-coherent and require a CPU:WC mapping. 517 517 */ 518 518 if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_SCANOUT) || 519 - (xe->info.graphics_verx100 >= 1270 && 520 - bo->flags & XE_BO_FLAG_PAGETABLE)) 519 + (!xe->info.has_cached_pt && bo->flags & XE_BO_FLAG_PAGETABLE)) 521 520 caching = ttm_write_combined; 522 521 } 523 522 ··· 2025 2026 struct ttm_buffer_object *ttm_bo = vma->vm_private_data; 2026 2027 struct xe_bo *bo = ttm_to_xe_bo(ttm_bo); 2027 2028 struct xe_device *xe = xe_bo_device(bo); 2028 - int ret; 2029 2029 2030 - xe_pm_runtime_get(xe); 2031 - ret = ttm_bo_vm_access(vma, addr, buf, len, write); 2032 - xe_pm_runtime_put(xe); 2033 - 2034 - return ret; 2030 + guard(xe_pm_runtime)(xe); 2031 + return ttm_bo_vm_access(vma, addr, buf, len, write); 2035 2032 } 2036 2033 2037 2034 /** ··· 3171 3176 if (XE_IOCTL_DBG(xe, args->flags & 3172 3177 ~(DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING | 3173 3178 DRM_XE_GEM_CREATE_FLAG_SCANOUT | 3174 - DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM))) 3179 + DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM | 3180 + DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION))) 3175 3181 return -EINVAL; 3176 3182 3177 3183 if (XE_IOCTL_DBG(xe, args->handle)) ··· 3193 3197 3194 3198 if (args->flags & DRM_XE_GEM_CREATE_FLAG_SCANOUT) 3195 3199 bo_flags |= XE_BO_FLAG_SCANOUT; 3200 + 3201 + if (args->flags & DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION) { 3202 + if (XE_IOCTL_DBG(xe, GRAPHICS_VER(xe) < 20)) 3203 + return -EOPNOTSUPP; 3204 + bo_flags |= XE_BO_FLAG_NO_COMPRESSION; 3205 + } 3196 3206 3197 3207 bo_flags |= args->placement << (ffs(XE_BO_FLAG_SYSTEM) - 1); 3198 3208 ··· 3521 3519 * Compression implies coh_none, therefore we know for sure that WB 3522 3520 * memory can't currently use compression, which is likely one of the 3523 3521 * common cases. 3522 + * Additionally, userspace may explicitly request no compression via the 3523 + * DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION flag, which should also disable 3524 + * CCS usage. 3524 3525 */ 3525 - if (bo->cpu_caching == DRM_XE_GEM_CPU_CACHING_WB) 3526 + if (bo->cpu_caching == DRM_XE_GEM_CPU_CACHING_WB || 3527 + bo->flags & XE_BO_FLAG_NO_COMPRESSION) 3526 3528 return false; 3527 3529 3528 3530 return true;
+1
drivers/gpu/drm/xe/xe_bo.h
··· 50 50 #define XE_BO_FLAG_GGTT3 BIT(23) 51 51 #define XE_BO_FLAG_CPU_ADDR_MIRROR BIT(24) 52 52 #define XE_BO_FLAG_FORCE_USER_VRAM BIT(25) 53 + #define XE_BO_FLAG_NO_COMPRESSION BIT(26) 53 54 54 55 /* this one is trigger internally only */ 55 56 #define XE_BO_FLAG_INTERNAL_TEST BIT(30)
+121 -22
drivers/gpu/drm/xe/xe_debugfs.c
··· 68 68 struct xe_gt *gt; 69 69 u8 id; 70 70 71 - xe_pm_runtime_get(xe); 71 + guard(xe_pm_runtime)(xe); 72 72 73 73 drm_printf(&p, "graphics_verx100 %d\n", xe->info.graphics_verx100); 74 74 drm_printf(&p, "media_verx100 %d\n", xe->info.media_verx100); ··· 93 93 xe_force_wake_ref(gt_to_fw(gt), XE_FW_GT)); 94 94 drm_printf(&p, "gt%d engine_mask 0x%llx\n", id, 95 95 gt->info.engine_mask); 96 + drm_printf(&p, "gt%d multi_queue_engine_class_mask 0x%x\n", id, 97 + gt->info.multi_queue_engine_class_mask); 96 98 } 97 99 98 - xe_pm_runtime_put(xe); 99 100 return 0; 100 101 } 101 102 ··· 111 110 112 111 static int workarounds(struct xe_device *xe, struct drm_printer *p) 113 112 { 114 - xe_pm_runtime_get(xe); 113 + guard(xe_pm_runtime)(xe); 115 114 xe_wa_device_dump(xe, p); 116 - xe_pm_runtime_put(xe); 117 115 118 116 return 0; 119 117 } ··· 134 134 135 135 xe = node_to_xe(m->private); 136 136 p = drm_seq_file_printer(m); 137 - xe_pm_runtime_get(xe); 137 + guard(xe_pm_runtime)(xe); 138 138 mmio = xe_root_tile_mmio(xe); 139 139 static const struct { 140 140 u32 offset; ··· 151 151 for (int i = 0; i < ARRAY_SIZE(residencies); i++) 152 152 read_residency_counter(xe, mmio, residencies[i].offset, residencies[i].name, &p); 153 153 154 - xe_pm_runtime_put(xe); 155 154 return 0; 156 155 } 157 156 ··· 162 163 163 164 xe = node_to_xe(m->private); 164 165 p = drm_seq_file_printer(m); 165 - xe_pm_runtime_get(xe); 166 + guard(xe_pm_runtime)(xe); 166 167 mmio = xe_root_tile_mmio(xe); 167 168 168 169 static const struct { ··· 177 178 for (int i = 0; i < ARRAY_SIZE(residencies); i++) 178 179 read_residency_counter(xe, mmio, residencies[i].offset, residencies[i].name, &p); 179 180 180 - xe_pm_runtime_put(xe); 181 181 return 0; 182 182 } 183 183 ··· 275 277 276 278 xe->wedged.mode = wedged_mode; 277 279 278 - xe_pm_runtime_get(xe); 280 + guard(xe_pm_runtime)(xe); 279 281 for_each_gt(gt, xe, id) { 280 282 ret = xe_guc_ads_scheduler_policy_toggle_reset(&gt->uc.guc.ads); 281 283 if (ret) { 282 284 xe_gt_err(gt, "Failed to update GuC ADS scheduler policy. GuC may still cause engine reset even with wedged_mode=2\n"); 283 - xe_pm_runtime_put(xe); 284 285 return -EIO; 285 286 } 286 287 } 287 - xe_pm_runtime_put(xe); 288 288 289 289 return size; 290 290 } ··· 291 295 .owner = THIS_MODULE, 292 296 .read = wedged_mode_show, 293 297 .write = wedged_mode_set, 298 + }; 299 + 300 + static ssize_t page_reclaim_hw_assist_show(struct file *f, char __user *ubuf, 301 + size_t size, loff_t *pos) 302 + { 303 + struct xe_device *xe = file_inode(f)->i_private; 304 + char buf[8]; 305 + int len; 306 + 307 + len = scnprintf(buf, sizeof(buf), "%d\n", xe->info.has_page_reclaim_hw_assist); 308 + return simple_read_from_buffer(ubuf, size, pos, buf, len); 309 + } 310 + 311 + static ssize_t page_reclaim_hw_assist_set(struct file *f, const char __user *ubuf, 312 + size_t size, loff_t *pos) 313 + { 314 + struct xe_device *xe = file_inode(f)->i_private; 315 + bool val; 316 + ssize_t ret; 317 + 318 + ret = kstrtobool_from_user(ubuf, size, &val); 319 + if (ret) 320 + return ret; 321 + 322 + xe->info.has_page_reclaim_hw_assist = val; 323 + 324 + return size; 325 + } 326 + 327 + static const struct file_operations page_reclaim_hw_assist_fops = { 328 + .owner = THIS_MODULE, 329 + .read = page_reclaim_hw_assist_show, 330 + .write = page_reclaim_hw_assist_set, 294 331 }; 295 332 296 333 static ssize_t atomic_svm_timeslice_ms_show(struct file *f, char __user *ubuf, ··· 359 330 .owner = THIS_MODULE, 360 331 .read = atomic_svm_timeslice_ms_show, 361 332 .write = atomic_svm_timeslice_ms_set, 333 + }; 334 + 335 + static ssize_t min_run_period_lr_ms_show(struct file *f, char __user *ubuf, 336 + size_t size, loff_t *pos) 337 + { 338 + struct xe_device *xe = file_inode(f)->i_private; 339 + char buf[32]; 340 + int len = 0; 341 + 342 + len = scnprintf(buf, sizeof(buf), "%d\n", xe->min_run_period_lr_ms); 343 + 344 + return simple_read_from_buffer(ubuf, size, pos, buf, len); 345 + } 346 + 347 + static ssize_t min_run_period_lr_ms_set(struct file *f, const char __user *ubuf, 348 + size_t size, loff_t *pos) 349 + { 350 + struct xe_device *xe = file_inode(f)->i_private; 351 + u32 min_run_period_lr_ms; 352 + ssize_t ret; 353 + 354 + ret = kstrtouint_from_user(ubuf, size, 0, &min_run_period_lr_ms); 355 + if (ret) 356 + return ret; 357 + 358 + xe->min_run_period_lr_ms = min_run_period_lr_ms; 359 + 360 + return size; 361 + } 362 + 363 + static const struct file_operations min_run_period_lr_ms_fops = { 364 + .owner = THIS_MODULE, 365 + .read = min_run_period_lr_ms_show, 366 + .write = min_run_period_lr_ms_set, 367 + }; 368 + 369 + static ssize_t min_run_period_pf_ms_show(struct file *f, char __user *ubuf, 370 + size_t size, loff_t *pos) 371 + { 372 + struct xe_device *xe = file_inode(f)->i_private; 373 + char buf[32]; 374 + int len = 0; 375 + 376 + len = scnprintf(buf, sizeof(buf), "%d\n", xe->min_run_period_pf_ms); 377 + 378 + return simple_read_from_buffer(ubuf, size, pos, buf, len); 379 + } 380 + 381 + static ssize_t min_run_period_pf_ms_set(struct file *f, const char __user *ubuf, 382 + size_t size, loff_t *pos) 383 + { 384 + struct xe_device *xe = file_inode(f)->i_private; 385 + u32 min_run_period_pf_ms; 386 + ssize_t ret; 387 + 388 + ret = kstrtouint_from_user(ubuf, size, 0, &min_run_period_pf_ms); 389 + if (ret) 390 + return ret; 391 + 392 + xe->min_run_period_pf_ms = min_run_period_pf_ms; 393 + 394 + return size; 395 + } 396 + 397 + static const struct file_operations min_run_period_pf_ms_fops = { 398 + .owner = THIS_MODULE, 399 + .read = min_run_period_pf_ms_show, 400 + .write = min_run_period_pf_ms_set, 362 401 }; 363 402 364 403 static ssize_t disable_late_binding_show(struct file *f, char __user *ubuf, ··· 472 375 struct ttm_resource_manager *man; 473 376 struct xe_tile *tile; 474 377 struct xe_gt *gt; 475 - u32 mem_type; 476 378 u8 tile_id; 477 379 u8 id; 478 380 ··· 496 400 debugfs_create_file("atomic_svm_timeslice_ms", 0600, root, xe, 497 401 &atomic_svm_timeslice_ms_fops); 498 402 403 + debugfs_create_file("min_run_period_lr_ms", 0600, root, xe, 404 + &min_run_period_lr_ms_fops); 405 + 406 + debugfs_create_file("min_run_period_pf_ms", 0600, root, xe, 407 + &min_run_period_pf_ms_fops); 408 + 499 409 debugfs_create_file("disable_late_binding", 0600, root, xe, 500 410 &disable_late_binding_fops); 501 411 502 - for (mem_type = XE_PL_VRAM0; mem_type <= XE_PL_VRAM1; ++mem_type) { 503 - man = ttm_manager_type(bdev, mem_type); 504 - 505 - if (man) { 506 - char name[16]; 507 - 508 - snprintf(name, sizeof(name), "vram%d_mm", mem_type - XE_PL_VRAM0); 509 - ttm_resource_manager_create_debugfs(man, root, name); 510 - } 511 - } 412 + /* 413 + * Don't expose page reclaim configuration file if not supported by the 414 + * hardware initially. 415 + */ 416 + if (xe->info.has_page_reclaim_hw_assist) 417 + debugfs_create_file("page_reclaim_hw_assist", 0600, root, xe, 418 + &page_reclaim_hw_assist_fops); 512 419 513 420 man = ttm_manager_type(bdev, XE_PL_TT); 514 421 ttm_resource_manager_create_debugfs(man, root, "gtt_mm");
+12 -18
drivers/gpu/drm/xe/xe_devcoredump.c
··· 276 276 struct xe_devcoredump_snapshot *ss = container_of(work, typeof(*ss), work); 277 277 struct xe_devcoredump *coredump = container_of(ss, typeof(*coredump), snapshot); 278 278 struct xe_device *xe = coredump_to_xe(coredump); 279 - unsigned int fw_ref; 280 279 281 280 /* 282 281 * NB: Despite passing a GFP_ flags parameter here, more allocations are done ··· 286 287 xe_devcoredump_read, xe_devcoredump_free, 287 288 XE_COREDUMP_TIMEOUT_JIFFIES); 288 289 289 - xe_pm_runtime_get(xe); 290 + guard(xe_pm_runtime)(xe); 290 291 291 292 /* keep going if fw fails as we still want to save the memory and SW data */ 292 - fw_ref = xe_force_wake_get(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL); 293 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) 294 - xe_gt_info(ss->gt, "failed to get forcewake for coredump capture\n"); 295 - xe_vm_snapshot_capture_delayed(ss->vm); 296 - xe_guc_exec_queue_snapshot_capture_delayed(ss->ge); 297 - xe_force_wake_put(gt_to_fw(ss->gt), fw_ref); 293 + xe_with_force_wake(fw_ref, gt_to_fw(ss->gt), XE_FORCEWAKE_ALL) { 294 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)) 295 + xe_gt_info(ss->gt, "failed to get forcewake for coredump capture\n"); 296 + xe_vm_snapshot_capture_delayed(ss->vm); 297 + xe_guc_exec_queue_snapshot_capture_delayed(ss->ge); 298 + } 298 299 299 300 ss->read.chunk_position = 0; 300 301 ··· 305 306 ss->read.buffer = kvmalloc(XE_DEVCOREDUMP_CHUNK_MAX, 306 307 GFP_USER); 307 308 if (!ss->read.buffer) 308 - goto put_pm; 309 + return; 309 310 310 311 __xe_devcoredump_read(ss->read.buffer, 311 312 XE_DEVCOREDUMP_CHUNK_MAX, ··· 313 314 } else { 314 315 ss->read.buffer = kvmalloc(ss->read.size, GFP_USER); 315 316 if (!ss->read.buffer) 316 - goto put_pm; 317 + return; 317 318 318 319 __xe_devcoredump_read(ss->read.buffer, ss->read.size, 0, 319 320 coredump); 320 321 xe_devcoredump_snapshot_free(ss); 321 322 } 322 - 323 - put_pm: 324 - xe_pm_runtime_put(xe); 325 323 } 326 324 327 325 static void devcoredump_snapshot(struct xe_devcoredump *coredump, ··· 328 332 struct xe_devcoredump_snapshot *ss = &coredump->snapshot; 329 333 struct xe_guc *guc = exec_queue_to_guc(q); 330 334 const char *process_name = "no process"; 331 - unsigned int fw_ref; 332 335 bool cookie; 333 336 334 337 ss->snapshot_time = ktime_get_real(); ··· 343 348 ss->gt = q->gt; 344 349 INIT_WORK(&ss->work, xe_devcoredump_deferred_snap_work); 345 350 346 - cookie = dma_fence_begin_signalling(); 347 - 348 351 /* keep going if fw fails as we still want to save the memory and SW data */ 349 - fw_ref = xe_force_wake_get(gt_to_fw(q->gt), XE_FORCEWAKE_ALL); 352 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(q->gt), XE_FORCEWAKE_ALL); 353 + 354 + cookie = dma_fence_begin_signalling(); 350 355 351 356 ss->guc.log = xe_guc_log_snapshot_capture(&guc->log, true); 352 357 ss->guc.ct = xe_guc_ct_snapshot_capture(&guc->ct); ··· 359 364 360 365 queue_work(system_unbound_wq, &ss->work); 361 366 362 - xe_force_wake_put(gt_to_fw(q->gt), fw_ref); 363 367 dma_fence_end_signalling(cookie); 364 368 } 365 369
+77 -23
drivers/gpu/drm/xe/xe_device.c
··· 166 166 struct xe_exec_queue *q; 167 167 unsigned long idx; 168 168 169 - xe_pm_runtime_get(xe); 169 + guard(xe_pm_runtime)(xe); 170 170 171 171 /* 172 172 * No need for exec_queue.lock here as there is no contention for it ··· 177 177 xa_for_each(&xef->exec_queue.xa, idx, q) { 178 178 if (q->vm && q->hwe->hw_engine_group) 179 179 xe_hw_engine_group_del_exec_queue(q->hwe->hw_engine_group, q); 180 - xe_exec_queue_kill(q); 180 + 181 + if (xe_exec_queue_is_multi_queue_primary(q)) 182 + xe_exec_queue_group_kill_put(q->multi_queue.group); 183 + else 184 + xe_exec_queue_kill(q); 185 + 181 186 xe_exec_queue_put(q); 182 187 } 183 188 xa_for_each(&xef->vm.xa, idx, vm) 184 189 xe_vm_close_and_put(vm); 185 190 186 191 xe_file_put(xef); 187 - 188 - xe_pm_runtime_put(xe); 189 192 } 190 193 191 194 static const struct drm_ioctl_desc xe_ioctls[] = { ··· 212 209 DRM_IOCTL_DEF_DRV(XE_MADVISE, xe_vm_madvise_ioctl, DRM_RENDER_ALLOW), 213 210 DRM_IOCTL_DEF_DRV(XE_VM_QUERY_MEM_RANGE_ATTRS, xe_vm_query_vmas_attrs_ioctl, 214 211 DRM_RENDER_ALLOW), 212 + DRM_IOCTL_DEF_DRV(XE_EXEC_QUEUE_SET_PROPERTY, xe_exec_queue_set_property_ioctl, 213 + DRM_RENDER_ALLOW), 215 214 }; 216 215 217 216 static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg) ··· 225 220 if (xe_device_wedged(xe)) 226 221 return -ECANCELED; 227 222 228 - ret = xe_pm_runtime_get_ioctl(xe); 223 + ACQUIRE(xe_pm_runtime_ioctl, pm)(xe); 224 + ret = ACQUIRE_ERR(xe_pm_runtime_ioctl, &pm); 229 225 if (ret >= 0) 230 226 ret = drm_ioctl(file, cmd, arg); 231 - xe_pm_runtime_put(xe); 232 227 233 228 return ret; 234 229 } ··· 243 238 if (xe_device_wedged(xe)) 244 239 return -ECANCELED; 245 240 246 - ret = xe_pm_runtime_get_ioctl(xe); 241 + ACQUIRE(xe_pm_runtime_ioctl, pm)(xe); 242 + ret = ACQUIRE_ERR(xe_pm_runtime_ioctl, &pm); 247 243 if (ret >= 0) 248 244 ret = drm_compat_ioctl(file, cmd, arg); 249 - xe_pm_runtime_put(xe); 250 245 251 246 return ret; 252 247 } ··· 460 455 xe->info.revid = pdev->revision; 461 456 xe->info.force_execlist = xe_modparam.force_execlist; 462 457 xe->atomic_svm_timeslice_ms = 5; 458 + xe->min_run_period_lr_ms = 5; 463 459 464 460 err = xe_irq_init(xe); 465 461 if (err) ··· 781 775 static int probe_has_flat_ccs(struct xe_device *xe) 782 776 { 783 777 struct xe_gt *gt; 784 - unsigned int fw_ref; 785 778 u32 reg; 786 779 787 780 /* Always enabled/disabled, no runtime check to do */ ··· 791 786 if (!gt) 792 787 return 0; 793 788 794 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 795 - if (!fw_ref) 789 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 790 + if (!fw_ref.domains) 796 791 return -ETIMEDOUT; 797 792 798 793 reg = xe_gt_mcr_unicast_read_any(gt, XE2_FLAT_CCS_BASE_RANGE_LOWER); ··· 802 797 drm_dbg(&xe->drm, 803 798 "Flat CCS has been disabled in bios, May lead to performance impact"); 804 799 805 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 806 - 807 800 return 0; 801 + } 802 + 803 + /* 804 + * Detect if the driver is being run on pre-production hardware. We don't 805 + * keep workarounds for pre-production hardware long term, so print an 806 + * error and add taint if we're being loaded on a pre-production platform 807 + * for which the pre-prod workarounds have already been removed. 808 + * 809 + * The general policy is that we'll remove any workarounds that only apply to 810 + * pre-production hardware around the time force_probe restrictions are lifted 811 + * for a platform of the next major IP generation (for example, Xe2 pre-prod 812 + * workarounds should be removed around the time the first Xe3 platforms have 813 + * force_probe lifted). 814 + */ 815 + static void detect_preproduction_hw(struct xe_device *xe) 816 + { 817 + struct xe_gt *gt; 818 + int id; 819 + 820 + /* 821 + * SR-IOV VFs don't have access to the FUSE2 register, so we can't 822 + * check pre-production status there. But the host OS will notice 823 + * and report the pre-production status, which should be enough to 824 + * help us catch mistaken use of pre-production hardware. 825 + */ 826 + if (IS_SRIOV_VF(xe)) 827 + return; 828 + 829 + /* 830 + * The "SW_CAP" fuse contains a bit indicating whether the device is a 831 + * production or pre-production device. This fuse is reflected through 832 + * the GT "FUSE2" register, even though the contents of the fuse are 833 + * not GT-specific. Every GT's reflection of this fuse should show the 834 + * same value, so we'll just use the first available GT for lookup. 835 + */ 836 + for_each_gt(gt, xe, id) 837 + break; 838 + 839 + if (!gt) 840 + return; 841 + 842 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 843 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FW_GT)) { 844 + xe_gt_err(gt, "Forcewake failure; cannot determine production/pre-production hw status.\n"); 845 + return; 846 + } 847 + 848 + if (xe_mmio_read32(&gt->mmio, FUSE2) & PRODUCTION_HW) 849 + return; 850 + 851 + xe_info(xe, "Pre-production hardware detected.\n"); 852 + if (!xe->info.has_pre_prod_wa) { 853 + xe_err(xe, "Pre-production workarounds for this platform have already been removed.\n"); 854 + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); 855 + } 808 856 } 809 857 810 858 int xe_device_probe(struct xe_device *xe) ··· 1030 972 if (err) 1031 973 goto err_unregister_display; 1032 974 975 + detect_preproduction_hw(xe); 976 + 1033 977 return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe); 1034 978 1035 979 err_unregister_display: ··· 1094 1034 */ 1095 1035 static void tdf_request_sync(struct xe_device *xe) 1096 1036 { 1097 - unsigned int fw_ref; 1098 1037 struct xe_gt *gt; 1099 1038 u8 id; 1100 1039 ··· 1101 1042 if (xe_gt_is_media_type(gt)) 1102 1043 continue; 1103 1044 1104 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 1105 - if (!fw_ref) 1045 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 1046 + if (!fw_ref.domains) 1106 1047 return; 1107 1048 1108 1049 xe_mmio_write32(&gt->mmio, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST); ··· 1117 1058 if (xe_mmio_wait32(&gt->mmio, XE2_TDF_CTRL, TRANSIENT_FLUSH_REQUEST, 0, 1118 1059 300, NULL, false)) 1119 1060 xe_gt_err_once(gt, "TD flush timeout\n"); 1120 - 1121 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 1122 1061 } 1123 1062 } 1124 1063 1125 1064 void xe_device_l2_flush(struct xe_device *xe) 1126 1065 { 1127 1066 struct xe_gt *gt; 1128 - unsigned int fw_ref; 1129 1067 1130 1068 gt = xe_root_mmio_gt(xe); 1131 1069 if (!gt) ··· 1131 1075 if (!XE_GT_WA(gt, 16023588340)) 1132 1076 return; 1133 1077 1134 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 1135 - if (!fw_ref) 1078 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 1079 + if (!fw_ref.domains) 1136 1080 return; 1137 1081 1138 1082 spin_lock(&gt->global_invl_lock); ··· 1142 1086 xe_gt_err_once(gt, "Global invalidation timeout\n"); 1143 1087 1144 1088 spin_unlock(&gt->global_invl_lock); 1145 - 1146 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 1147 1089 } 1148 1090 1149 1091 /**
+5
drivers/gpu/drm/xe/xe_device.h
··· 172 172 return IS_DGFX(xe); 173 173 } 174 174 175 + static inline bool xe_device_has_mert(struct xe_device *xe) 176 + { 177 + return xe->info.has_mert; 178 + } 179 + 175 180 u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size); 176 181 177 182 void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p);
+13 -20
drivers/gpu/drm/xe/xe_device_sysfs.c
··· 57 57 58 58 drm_dbg(&xe->drm, "vram_d3cold_threshold: %u\n", vram_d3cold_threshold); 59 59 60 - xe_pm_runtime_get(xe); 60 + guard(xe_pm_runtime)(xe); 61 61 ret = xe_pm_set_vram_threshold(xe, vram_d3cold_threshold); 62 - xe_pm_runtime_put(xe); 63 62 64 63 return ret ?: count; 65 64 } ··· 83 84 u16 major = 0, minor = 0, hotfix = 0, build = 0; 84 85 int ret; 85 86 86 - xe_pm_runtime_get(xe); 87 + guard(xe_pm_runtime)(xe); 87 88 88 89 ret = xe_pcode_read(root, PCODE_MBOX(PCODE_LATE_BINDING, GET_CAPABILITY_STATUS, 0), 89 90 &cap, NULL); 90 91 if (ret) 91 - goto out; 92 + return ret; 92 93 93 94 if (REG_FIELD_GET(V1_FAN_PROVISIONED, cap)) { 94 95 ret = xe_pcode_read(root, PCODE_MBOX(PCODE_LATE_BINDING, GET_VERSION_LOW, 0), 95 96 &ver_low, NULL); 96 97 if (ret) 97 - goto out; 98 + return ret; 98 99 99 100 ret = xe_pcode_read(root, PCODE_MBOX(PCODE_LATE_BINDING, GET_VERSION_HIGH, 0), 100 101 &ver_high, NULL); 101 102 if (ret) 102 - goto out; 103 + return ret; 103 104 104 105 major = REG_FIELD_GET(MAJOR_VERSION_MASK, ver_low); 105 106 minor = REG_FIELD_GET(MINOR_VERSION_MASK, ver_low); 106 107 hotfix = REG_FIELD_GET(HOTFIX_VERSION_MASK, ver_high); 107 108 build = REG_FIELD_GET(BUILD_VERSION_MASK, ver_high); 108 109 } 109 - out: 110 - xe_pm_runtime_put(xe); 111 110 112 - return ret ?: sysfs_emit(buf, "%u.%u.%u.%u\n", major, minor, hotfix, build); 111 + return sysfs_emit(buf, "%u.%u.%u.%u\n", major, minor, hotfix, build); 113 112 } 114 113 static DEVICE_ATTR_ADMIN_RO(lb_fan_control_version); 115 114 ··· 120 123 u16 major = 0, minor = 0, hotfix = 0, build = 0; 121 124 int ret; 122 125 123 - xe_pm_runtime_get(xe); 126 + guard(xe_pm_runtime)(xe); 124 127 125 128 ret = xe_pcode_read(root, PCODE_MBOX(PCODE_LATE_BINDING, GET_CAPABILITY_STATUS, 0), 126 129 &cap, NULL); 127 130 if (ret) 128 - goto out; 131 + return ret; 129 132 130 133 if (REG_FIELD_GET(VR_PARAMS_PROVISIONED, cap)) { 131 134 ret = xe_pcode_read(root, PCODE_MBOX(PCODE_LATE_BINDING, GET_VERSION_LOW, 0), 132 135 &ver_low, NULL); 133 136 if (ret) 134 - goto out; 137 + return ret; 135 138 136 139 ret = xe_pcode_read(root, PCODE_MBOX(PCODE_LATE_BINDING, GET_VERSION_HIGH, 0), 137 140 &ver_high, NULL); 138 141 if (ret) 139 - goto out; 142 + return ret; 140 143 141 144 major = REG_FIELD_GET(MAJOR_VERSION_MASK, ver_low); 142 145 minor = REG_FIELD_GET(MINOR_VERSION_MASK, ver_low); 143 146 hotfix = REG_FIELD_GET(HOTFIX_VERSION_MASK, ver_high); 144 147 build = REG_FIELD_GET(BUILD_VERSION_MASK, ver_high); 145 148 } 146 - out: 147 - xe_pm_runtime_put(xe); 148 149 149 - return ret ?: sysfs_emit(buf, "%u.%u.%u.%u\n", major, minor, hotfix, build); 150 + return sysfs_emit(buf, "%u.%u.%u.%u\n", major, minor, hotfix, build); 150 151 } 151 152 static DEVICE_ATTR_ADMIN_RO(lb_voltage_regulator_version); 152 153 ··· 228 233 struct xe_device *xe = pdev_to_xe_device(pdev); 229 234 u32 cap, val; 230 235 231 - xe_pm_runtime_get(xe); 236 + guard(xe_pm_runtime)(xe); 232 237 val = xe_mmio_read32(xe_root_tile_mmio(xe), BMG_PCIE_CAP); 233 - xe_pm_runtime_put(xe); 234 238 235 239 cap = REG_FIELD_GET(LINK_DOWNGRADE, val); 236 240 return sysfs_emit(buf, "%u\n", cap == DOWNGRADE_CAPABLE); ··· 245 251 u32 val = 0; 246 252 int ret; 247 253 248 - xe_pm_runtime_get(xe); 254 + guard(xe_pm_runtime)(xe); 249 255 ret = xe_pcode_read(xe_device_get_root_tile(xe), 250 256 PCODE_MBOX(DGFX_PCODE_STATUS, DGFX_GET_INIT_STATUS, 0), 251 257 &val, NULL); 252 - xe_pm_runtime_put(xe); 253 258 254 259 return ret ?: sysfs_emit(buf, "%u\n", REG_FIELD_GET(DGFX_LINK_DOWNGRADE_STATUS, val)); 255 260 }
+27
drivers/gpu/drm/xe/xe_device_types.h
··· 17 17 #include "xe_late_bind_fw_types.h" 18 18 #include "xe_lmtt_types.h" 19 19 #include "xe_memirq_types.h" 20 + #include "xe_mert.h" 20 21 #include "xe_oa_types.h" 21 22 #include "xe_pagefault_types.h" 22 23 #include "xe_platform_types.h" ··· 184 183 * Media GT shares a pool with its primary GT. 185 184 */ 186 185 struct xe_sa_manager *kernel_bb_pool; 186 + 187 + /** 188 + * @mem.reclaim_pool: Pool for PRLs allocated. 189 + * 190 + * Only main GT has page reclaim list allocations. 191 + */ 192 + struct xe_sa_manager *reclaim_pool; 187 193 } mem; 188 194 189 195 /** @sriov: tile level virtualization data */ ··· 227 219 228 220 /** @debugfs: debugfs directory associated with this tile */ 229 221 struct dentry *debugfs; 222 + 223 + /** @mert: MERT-related data */ 224 + struct xe_mert mert; 230 225 }; 231 226 232 227 /** ··· 296 285 u8 has_asid:1; 297 286 /** @info.has_atomic_enable_pte_bit: Device has atomic enable PTE bit */ 298 287 u8 has_atomic_enable_pte_bit:1; 288 + /** @info.has_cached_pt: Supports caching pagetable */ 289 + u8 has_cached_pt:1; 299 290 /** @info.has_device_atomics_on_smem: Supports device atomics on SMEM */ 300 291 u8 has_device_atomics_on_smem:1; 301 292 /** @info.has_fan_control: Device supports fan control */ ··· 310 297 u8 has_heci_cscfi:1; 311 298 /** @info.has_heci_gscfi: device has heci gscfi */ 312 299 u8 has_heci_gscfi:1; 300 + /** @info.has_i2c: Device has I2C controller */ 301 + u8 has_i2c:1; 313 302 /** @info.has_late_bind: Device has firmware late binding support */ 314 303 u8 has_late_bind:1; 315 304 /** @info.has_llc: Device has a shared CPU+GPU last level cache */ ··· 322 307 u8 has_mbx_power_limits:1; 323 308 /** @info.has_mem_copy_instr: Device supports MEM_COPY instruction */ 324 309 u8 has_mem_copy_instr:1; 310 + /** @info.has_mert: Device has standalone MERT */ 311 + u8 has_mert:1; 312 + /** @info.has_page_reclaim_hw_assist: Device supports page reclamation feature */ 313 + u8 has_page_reclaim_hw_assist:1; 314 + /** @info.has_pre_prod_wa: Pre-production workarounds still present in driver */ 315 + u8 has_pre_prod_wa:1; 325 316 /** @info.has_pxp: Device has PXP support */ 326 317 u8 has_pxp:1; 327 318 /** @info.has_range_tlb_inval: Has range based TLB invalidations */ ··· 625 604 626 605 /** @atomic_svm_timeslice_ms: Atomic SVM fault timeslice MS */ 627 606 u32 atomic_svm_timeslice_ms; 607 + 608 + /** @min_run_period_lr_ms: LR VM (preempt fence mode) timeslice */ 609 + u32 min_run_period_lr_ms; 610 + 611 + /** @min_run_period_pf_ms: LR VM (page fault mode) timeslice */ 612 + u32 min_run_period_pf_ms; 628 613 629 614 #ifdef TEST_VM_OPS_ERROR 630 615 /**
+31 -36
drivers/gpu/drm/xe/xe_drm_client.c
··· 285 285 return NULL; 286 286 } 287 287 288 - static bool force_wake_get_any_engine(struct xe_device *xe, 289 - struct xe_hw_engine **phwe, 290 - unsigned int *pfw_ref) 288 + /* 289 + * Pick any engine and grab its forcewake. On error phwe will be NULL and 290 + * the returned forcewake reference will be invalid. Callers should check 291 + * phwe against NULL. 292 + */ 293 + static struct xe_force_wake_ref force_wake_get_any_engine(struct xe_device *xe, 294 + struct xe_hw_engine **phwe) 291 295 { 292 296 enum xe_force_wake_domains domain; 293 - unsigned int fw_ref; 297 + struct xe_force_wake_ref fw_ref = {}; 294 298 struct xe_hw_engine *hwe; 295 - struct xe_force_wake *fw; 299 + 300 + *phwe = NULL; 296 301 297 302 hwe = any_engine(xe); 298 303 if (!hwe) 299 - return false; 304 + return fw_ref; /* will be invalid */ 300 305 301 306 domain = xe_hw_engine_to_fw_domain(hwe); 302 - fw = gt_to_fw(hwe->gt); 303 307 304 - fw_ref = xe_force_wake_get(fw, domain); 305 - if (!xe_force_wake_ref_has_domain(fw_ref, domain)) { 306 - xe_force_wake_put(fw, fw_ref); 307 - return false; 308 - } 308 + fw_ref = xe_force_wake_constructor(gt_to_fw(hwe->gt), domain); 309 + if (xe_force_wake_ref_has_domain(fw_ref.domains, domain)) 310 + *phwe = hwe; /* valid forcewake */ 309 311 310 - *phwe = hwe; 311 - *pfw_ref = fw_ref; 312 - 313 - return true; 312 + return fw_ref; 314 313 } 315 314 316 315 static void show_run_ticks(struct drm_printer *p, struct drm_file *file) ··· 321 322 struct xe_hw_engine *hwe; 322 323 struct xe_exec_queue *q; 323 324 u64 gpu_timestamp; 324 - unsigned int fw_ref; 325 325 326 326 /* 327 327 * RING_TIMESTAMP registers are inaccessible in VF mode. ··· 337 339 wait_var_event(&xef->exec_queue.pending_removal, 338 340 !atomic_read(&xef->exec_queue.pending_removal)); 339 341 340 - xe_pm_runtime_get(xe); 341 - if (!force_wake_get_any_engine(xe, &hwe, &fw_ref)) { 342 - xe_pm_runtime_put(xe); 343 - return; 344 - } 342 + scoped_guard(xe_pm_runtime, xe) { 343 + CLASS(xe_force_wake_release_only, fw_ref)(force_wake_get_any_engine(xe, &hwe)); 344 + if (!hwe) 345 + return; 345 346 346 - /* Accumulate all the exec queues from this client */ 347 - mutex_lock(&xef->exec_queue.lock); 348 - xa_for_each(&xef->exec_queue.xa, i, q) { 349 - xe_exec_queue_get(q); 347 + /* Accumulate all the exec queues from this client */ 348 + mutex_lock(&xef->exec_queue.lock); 349 + xa_for_each(&xef->exec_queue.xa, i, q) { 350 + xe_exec_queue_get(q); 351 + mutex_unlock(&xef->exec_queue.lock); 352 + 353 + xe_exec_queue_update_run_ticks(q); 354 + 355 + mutex_lock(&xef->exec_queue.lock); 356 + xe_exec_queue_put(q); 357 + } 350 358 mutex_unlock(&xef->exec_queue.lock); 351 359 352 - xe_exec_queue_update_run_ticks(q); 353 - 354 - mutex_lock(&xef->exec_queue.lock); 355 - xe_exec_queue_put(q); 360 + gpu_timestamp = xe_hw_engine_read_timestamp(hwe); 356 361 } 357 - mutex_unlock(&xef->exec_queue.lock); 358 - 359 - gpu_timestamp = xe_hw_engine_read_timestamp(hwe); 360 - 361 - xe_force_wake_put(gt_to_fw(hwe->gt), fw_ref); 362 - xe_pm_runtime_put(xe); 363 362 364 363 for (class = 0; class < XE_ENGINE_CLASS_MAX; class++) { 365 364 const char *class_name;
+7 -2
drivers/gpu/drm/xe/xe_exec.c
··· 121 121 u64 addresses[XE_HW_ENGINE_MAX_INSTANCE]; 122 122 struct drm_gpuvm_exec vm_exec = {.extra.fn = xe_exec_fn}; 123 123 struct drm_exec *exec = &vm_exec.exec; 124 - u32 i, num_syncs, num_ufence = 0; 124 + u32 i, num_syncs, num_in_sync = 0, num_ufence = 0; 125 125 struct xe_validation_ctx ctx; 126 126 struct xe_sched_job *job; 127 127 struct xe_vm *vm; ··· 183 183 184 184 if (xe_sync_is_ufence(&syncs[num_syncs])) 185 185 num_ufence++; 186 + 187 + if (!num_in_sync && xe_sync_needs_wait(&syncs[num_syncs])) 188 + num_in_sync++; 186 189 } 187 190 188 191 if (XE_IOCTL_DBG(xe, num_ufence > 1)) { ··· 206 203 mode = xe_hw_engine_group_find_exec_mode(q); 207 204 208 205 if (mode == EXEC_MODE_DMA_FENCE) { 209 - err = xe_hw_engine_group_get_mode(group, mode, &previous_mode); 206 + err = xe_hw_engine_group_get_mode(group, mode, &previous_mode, 207 + syncs, num_in_sync ? 208 + num_syncs : 0); 210 209 if (err) 211 210 goto err_syncs; 212 211 }
+441 -14
drivers/gpu/drm/xe/xe_exec_queue.c
··· 13 13 #include <drm/drm_syncobj.h> 14 14 #include <uapi/drm/xe_drm.h> 15 15 16 + #include "xe_bo.h" 16 17 #include "xe_dep_scheduler.h" 17 18 #include "xe_device.h" 18 19 #include "xe_gt.h" ··· 54 53 * the ring operations the different engine classes support. 55 54 */ 56 55 56 + /** 57 + * DOC: Multi Queue Group 58 + * 59 + * Multi Queue Group is another mode of execution supported by the compute 60 + * and blitter copy command streamers (CCS and BCS, respectively). It is 61 + * an enhancement of the existing hardware architecture and leverages the 62 + * same submission model. It enables support for efficient, parallel 63 + * execution of multiple queues within a single shared context. The multi 64 + * queue group functionality is only supported with GuC submission backend. 65 + * All the queues of a group must use the same address space (VM). 66 + * 67 + * The DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue property 68 + * supports creating a multi queue group and adding queues to a queue group. 69 + * 70 + * The XE_EXEC_QUEUE_CREATE ioctl call with above property with value field 71 + * set to DRM_XE_MULTI_GROUP_CREATE, will create a new multi queue group with 72 + * the queue being created as the primary queue (aka q0) of the group. To add 73 + * secondary queues to the group, they need to be created with the above 74 + * property with id of the primary queue as the value. The properties of 75 + * the primary queue (like priority, time slice) applies to the whole group. 76 + * So, these properties can't be set for secondary queues of a group. 77 + * 78 + * The hardware does not support removing a queue from a multi-queue group. 79 + * However, queues can be dynamically added to the group. A group can have 80 + * up to 64 queues. To support this, XeKMD holds references to LRCs of the 81 + * queues even after the queues are destroyed by the user until the whole 82 + * group is destroyed. The secondary queues hold a reference to the primary 83 + * queue thus preventing the group from being destroyed when user destroys 84 + * the primary queue. Once the primary queue is destroyed, secondary queues 85 + * can't be added to the queue group, but they can continue to submit the 86 + * jobs if the DRM_XE_MULTI_GROUP_KEEP_ACTIVE flag is set during the multi 87 + * queue group creation. 88 + * 89 + * The queues of a multi queue group can set their priority within the group 90 + * through the DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY property. 91 + * This multi queue priority can also be set dynamically through the 92 + * XE_EXEC_QUEUE_SET_PROPERTY ioctl. This is the only other property 93 + * supported by the secondary queues of a multi queue group, other than 94 + * DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE. 95 + * 96 + * When GuC reports an error on any of the queues of a multi queue group, 97 + * the queue cleanup mechanism is invoked for all the queues of the group 98 + * as hardware cannot make progress on the multi queue context. 99 + * 100 + * Refer :ref:`multi-queue-group-guc-interface` for multi queue group GuC 101 + * interface. 102 + */ 103 + 57 104 enum xe_exec_queue_sched_prop { 58 105 XE_EXEC_QUEUE_JOB_TIMEOUT = 0, 59 106 XE_EXEC_QUEUE_TIMESLICE = 1, ··· 110 61 }; 111 62 112 63 static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q, 113 - u64 extensions, int ext_number); 64 + u64 extensions); 65 + 66 + static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q) 67 + { 68 + struct xe_exec_queue_group *group = q->multi_queue.group; 69 + struct xe_lrc *lrc; 70 + unsigned long idx; 71 + 72 + if (xe_exec_queue_is_multi_queue_secondary(q)) { 73 + /* 74 + * Put pairs with get from xe_exec_queue_lookup() call 75 + * in xe_exec_queue_group_validate(). 76 + */ 77 + xe_exec_queue_put(xe_exec_queue_multi_queue_primary(q)); 78 + return; 79 + } 80 + 81 + if (!group) 82 + return; 83 + 84 + /* Primary queue cleanup */ 85 + xa_for_each(&group->xa, idx, lrc) 86 + xe_lrc_put(lrc); 87 + 88 + xa_destroy(&group->xa); 89 + mutex_destroy(&group->list_lock); 90 + xe_bo_unpin_map_no_vm(group->cgp_bo); 91 + kfree(group); 92 + } 114 93 115 94 static void __xe_exec_queue_free(struct xe_exec_queue *q) 116 95 { ··· 150 73 151 74 if (xe_exec_queue_uses_pxp(q)) 152 75 xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q); 76 + 77 + if (xe_exec_queue_is_multi_queue(q)) 78 + xe_exec_queue_group_cleanup(q); 79 + 153 80 if (q->vm) 154 81 xe_vm_put(q->vm); 155 82 156 83 if (q->xef) 157 84 xe_file_put(q->xef); 158 85 86 + kvfree(q->replay_state); 159 87 kfree(q); 160 88 } 161 89 ··· 229 147 INIT_LIST_HEAD(&q->multi_gt_link); 230 148 INIT_LIST_HEAD(&q->hw_engine_group_link); 231 149 INIT_LIST_HEAD(&q->pxp.link); 150 + q->multi_queue.priority = XE_MULTI_QUEUE_PRIORITY_NORMAL; 232 151 233 152 q->sched_props.timeslice_us = hwe->eclass->sched_props.timeslice_us; 234 153 q->sched_props.preempt_timeout_us = ··· 258 175 * may set q->usm, must come before xe_lrc_create(), 259 176 * may overwrite q->sched_props, must come before q->ops->init() 260 177 */ 261 - err = exec_queue_user_extensions(xe, q, extensions, 0); 178 + err = exec_queue_user_extensions(xe, q, extensions); 262 179 if (err) { 263 180 __xe_exec_queue_free(q); 264 181 return ERR_PTR(err); ··· 308 225 struct xe_lrc *lrc; 309 226 310 227 xe_gt_sriov_vf_wait_valid_ggtt(q->gt); 311 - lrc = xe_lrc_create(q->hwe, q->vm, xe_lrc_ring_size(), 312 - q->msix_vec, flags); 228 + lrc = xe_lrc_create(q->hwe, q->vm, q->replay_state, 229 + xe_lrc_ring_size(), q->msix_vec, flags); 313 230 if (IS_ERR(lrc)) { 314 231 err = PTR_ERR(lrc); 315 232 goto err_lrc; ··· 465 382 return q; 466 383 } 467 384 ALLOW_ERROR_INJECTION(xe_exec_queue_create_bind, ERRNO); 385 + 386 + static void xe_exec_queue_group_kill(struct kref *ref) 387 + { 388 + struct xe_exec_queue_group *group = container_of(ref, struct xe_exec_queue_group, 389 + kill_refcount); 390 + xe_exec_queue_kill(group->primary); 391 + } 392 + 393 + static inline void xe_exec_queue_group_kill_get(struct xe_exec_queue_group *group) 394 + { 395 + kref_get(&group->kill_refcount); 396 + } 397 + 398 + void xe_exec_queue_group_kill_put(struct xe_exec_queue_group *group) 399 + { 400 + if (!group) 401 + return; 402 + 403 + kref_put(&group->kill_refcount, xe_exec_queue_group_kill); 404 + } 468 405 469 406 void xe_exec_queue_destroy(struct kref *ref) 470 407 { ··· 670 567 return xe_pxp_exec_queue_set_type(xe->pxp, q, DRM_XE_PXP_TYPE_HWDRM); 671 568 } 672 569 570 + static int exec_queue_set_hang_replay_state(struct xe_device *xe, 571 + struct xe_exec_queue *q, 572 + u64 value) 573 + { 574 + size_t size = xe_gt_lrc_hang_replay_size(q->gt, q->class); 575 + u64 __user *address = u64_to_user_ptr(value); 576 + void *ptr; 577 + 578 + ptr = vmemdup_user(address, size); 579 + if (XE_IOCTL_DBG(xe, IS_ERR(ptr))) 580 + return PTR_ERR(ptr); 581 + 582 + q->replay_state = ptr; 583 + 584 + return 0; 585 + } 586 + 587 + static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *q) 588 + { 589 + struct xe_tile *tile = gt_to_tile(q->gt); 590 + struct xe_exec_queue_group *group; 591 + struct xe_bo *bo; 592 + 593 + group = kzalloc(sizeof(*group), GFP_KERNEL); 594 + if (!group) 595 + return -ENOMEM; 596 + 597 + bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K, ttm_bo_type_kernel, 598 + XE_BO_FLAG_VRAM_IF_DGFX(tile) | 599 + XE_BO_FLAG_PINNED_LATE_RESTORE | 600 + XE_BO_FLAG_FORCE_USER_VRAM | 601 + XE_BO_FLAG_GGTT_INVALIDATE | 602 + XE_BO_FLAG_GGTT, false); 603 + if (IS_ERR(bo)) { 604 + drm_err(&xe->drm, "CGP bo allocation for queue group failed: %ld\n", 605 + PTR_ERR(bo)); 606 + kfree(group); 607 + return PTR_ERR(bo); 608 + } 609 + 610 + xe_map_memset(xe, &bo->vmap, 0, 0, SZ_4K); 611 + 612 + group->primary = q; 613 + group->cgp_bo = bo; 614 + INIT_LIST_HEAD(&group->list); 615 + kref_init(&group->kill_refcount); 616 + xa_init_flags(&group->xa, XA_FLAGS_ALLOC1); 617 + mutex_init(&group->list_lock); 618 + q->multi_queue.group = group; 619 + 620 + /* group->list_lock is used in submission backend */ 621 + if (IS_ENABLED(CONFIG_LOCKDEP)) { 622 + fs_reclaim_acquire(GFP_KERNEL); 623 + might_lock(&group->list_lock); 624 + fs_reclaim_release(GFP_KERNEL); 625 + } 626 + 627 + return 0; 628 + } 629 + 630 + static inline bool xe_exec_queue_supports_multi_queue(struct xe_exec_queue *q) 631 + { 632 + return q->gt->info.multi_queue_engine_class_mask & BIT(q->class); 633 + } 634 + 635 + static int xe_exec_queue_group_validate(struct xe_device *xe, struct xe_exec_queue *q, 636 + u32 primary_id) 637 + { 638 + struct xe_exec_queue_group *group; 639 + struct xe_exec_queue *primary; 640 + int ret; 641 + 642 + /* 643 + * Get from below xe_exec_queue_lookup() pairs with put 644 + * in xe_exec_queue_group_cleanup(). 645 + */ 646 + primary = xe_exec_queue_lookup(q->vm->xef, primary_id); 647 + if (XE_IOCTL_DBG(xe, !primary)) 648 + return -ENOENT; 649 + 650 + if (XE_IOCTL_DBG(xe, !xe_exec_queue_is_multi_queue_primary(primary)) || 651 + XE_IOCTL_DBG(xe, q->vm != primary->vm) || 652 + XE_IOCTL_DBG(xe, q->logical_mask != primary->logical_mask)) { 653 + ret = -EINVAL; 654 + goto put_primary; 655 + } 656 + 657 + group = primary->multi_queue.group; 658 + q->multi_queue.valid = true; 659 + q->multi_queue.group = group; 660 + 661 + return 0; 662 + put_primary: 663 + xe_exec_queue_put(primary); 664 + return ret; 665 + } 666 + 667 + #define XE_MAX_GROUP_SIZE 64 668 + static int xe_exec_queue_group_add(struct xe_device *xe, struct xe_exec_queue *q) 669 + { 670 + struct xe_exec_queue_group *group = q->multi_queue.group; 671 + u32 pos; 672 + int err; 673 + 674 + xe_assert(xe, xe_exec_queue_is_multi_queue_secondary(q)); 675 + 676 + /* Primary queue holds a reference to LRCs of all secondary queues */ 677 + err = xa_alloc(&group->xa, &pos, xe_lrc_get(q->lrc[0]), 678 + XA_LIMIT(1, XE_MAX_GROUP_SIZE - 1), GFP_KERNEL); 679 + if (XE_IOCTL_DBG(xe, err)) { 680 + xe_lrc_put(q->lrc[0]); 681 + 682 + /* It is invalid if queue group limit is exceeded */ 683 + if (err == -EBUSY) 684 + err = -EINVAL; 685 + 686 + return err; 687 + } 688 + 689 + q->multi_queue.pos = pos; 690 + 691 + if (group->primary->multi_queue.keep_active) { 692 + xe_exec_queue_group_kill_get(group); 693 + q->multi_queue.keep_active = true; 694 + } 695 + 696 + return 0; 697 + } 698 + 699 + static void xe_exec_queue_group_delete(struct xe_device *xe, struct xe_exec_queue *q) 700 + { 701 + struct xe_exec_queue_group *group = q->multi_queue.group; 702 + struct xe_lrc *lrc; 703 + 704 + xe_assert(xe, xe_exec_queue_is_multi_queue_secondary(q)); 705 + 706 + lrc = xa_erase(&group->xa, q->multi_queue.pos); 707 + xe_assert(xe, lrc); 708 + xe_lrc_put(lrc); 709 + 710 + if (q->multi_queue.keep_active) { 711 + xe_exec_queue_group_kill_put(group); 712 + q->multi_queue.keep_active = false; 713 + } 714 + } 715 + 716 + static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue *q, 717 + u64 value) 718 + { 719 + if (XE_IOCTL_DBG(xe, !xe_exec_queue_supports_multi_queue(q))) 720 + return -ENODEV; 721 + 722 + if (XE_IOCTL_DBG(xe, !xe_device_uc_enabled(xe))) 723 + return -EOPNOTSUPP; 724 + 725 + if (XE_IOCTL_DBG(xe, !q->vm->xef)) 726 + return -EINVAL; 727 + 728 + if (XE_IOCTL_DBG(xe, xe_exec_queue_is_parallel(q))) 729 + return -EINVAL; 730 + 731 + if (XE_IOCTL_DBG(xe, xe_exec_queue_is_multi_queue(q))) 732 + return -EINVAL; 733 + 734 + if (value & DRM_XE_MULTI_GROUP_CREATE) { 735 + if (XE_IOCTL_DBG(xe, value & ~(DRM_XE_MULTI_GROUP_CREATE | 736 + DRM_XE_MULTI_GROUP_KEEP_ACTIVE))) 737 + return -EINVAL; 738 + 739 + /* 740 + * KEEP_ACTIVE is not supported in preempt fence mode as in that mode, 741 + * VM_DESTROY ioctl expects all exec queues of that VM are already killed. 742 + */ 743 + if (XE_IOCTL_DBG(xe, (value & DRM_XE_MULTI_GROUP_KEEP_ACTIVE) && 744 + xe_vm_in_preempt_fence_mode(q->vm))) 745 + return -EINVAL; 746 + 747 + q->multi_queue.valid = true; 748 + q->multi_queue.is_primary = true; 749 + q->multi_queue.pos = 0; 750 + if (value & DRM_XE_MULTI_GROUP_KEEP_ACTIVE) 751 + q->multi_queue.keep_active = true; 752 + 753 + return 0; 754 + } 755 + 756 + /* While adding secondary queues, the upper 32 bits must be 0 */ 757 + if (XE_IOCTL_DBG(xe, value & (~0ull << 32))) 758 + return -EINVAL; 759 + 760 + return xe_exec_queue_group_validate(xe, q, value); 761 + } 762 + 763 + static int exec_queue_set_multi_queue_priority(struct xe_device *xe, struct xe_exec_queue *q, 764 + u64 value) 765 + { 766 + if (XE_IOCTL_DBG(xe, value > XE_MULTI_QUEUE_PRIORITY_HIGH)) 767 + return -EINVAL; 768 + 769 + /* For queue creation time (!q->xef) setting, just store the priority value */ 770 + if (!q->xef) { 771 + q->multi_queue.priority = value; 772 + return 0; 773 + } 774 + 775 + if (!xe_exec_queue_is_multi_queue(q)) 776 + return -EINVAL; 777 + 778 + return q->ops->set_multi_queue_priority(q, value); 779 + } 780 + 673 781 typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe, 674 782 struct xe_exec_queue *q, 675 783 u64 value); ··· 889 575 [DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY] = exec_queue_set_priority, 890 576 [DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice, 891 577 [DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE] = exec_queue_set_pxp_type, 578 + [DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE] = exec_queue_set_hang_replay_state, 579 + [DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP] = exec_queue_set_multi_group, 580 + [DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY] = 581 + exec_queue_set_multi_queue_priority, 892 582 }; 583 + 584 + int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data, 585 + struct drm_file *file) 586 + { 587 + struct xe_device *xe = to_xe_device(dev); 588 + struct xe_file *xef = to_xe_file(file); 589 + struct drm_xe_exec_queue_set_property *args = data; 590 + struct xe_exec_queue *q; 591 + int ret; 592 + u32 idx; 593 + 594 + if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1])) 595 + return -EINVAL; 596 + 597 + if (XE_IOCTL_DBG(xe, args->property != 598 + DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY)) 599 + return -EINVAL; 600 + 601 + q = xe_exec_queue_lookup(xef, args->exec_queue_id); 602 + if (XE_IOCTL_DBG(xe, !q)) 603 + return -ENOENT; 604 + 605 + idx = array_index_nospec(args->property, 606 + ARRAY_SIZE(exec_queue_set_property_funcs)); 607 + ret = exec_queue_set_property_funcs[idx](xe, q, args->value); 608 + if (XE_IOCTL_DBG(xe, ret)) 609 + goto err_post_lookup; 610 + 611 + xe_exec_queue_put(q); 612 + return 0; 613 + 614 + err_post_lookup: 615 + xe_exec_queue_put(q); 616 + return ret; 617 + } 618 + 619 + static int exec_queue_user_ext_check(struct xe_exec_queue *q, u64 properties) 620 + { 621 + u64 secondary_queue_valid_props = BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP) | 622 + BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY); 623 + 624 + /* 625 + * Only MULTI_QUEUE_PRIORITY property is valid for secondary queues of a 626 + * multi-queue group. 627 + */ 628 + if (xe_exec_queue_is_multi_queue_secondary(q) && 629 + properties & ~secondary_queue_valid_props) 630 + return -EINVAL; 631 + 632 + return 0; 633 + } 634 + 635 + static int exec_queue_user_ext_check_final(struct xe_exec_queue *q, u64 properties) 636 + { 637 + /* MULTI_QUEUE_PRIORITY only applies to multi-queue group queues */ 638 + if ((properties & BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY)) && 639 + !(properties & BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP))) 640 + return -EINVAL; 641 + 642 + return 0; 643 + } 893 644 894 645 static int exec_queue_user_ext_set_property(struct xe_device *xe, 895 646 struct xe_exec_queue *q, 896 - u64 extension) 647 + u64 extension, u64 *properties) 897 648 { 898 649 u64 __user *address = u64_to_user_ptr(extension); 899 650 struct drm_xe_ext_set_property ext; ··· 974 595 XE_IOCTL_DBG(xe, ext.pad) || 975 596 XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY && 976 597 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE && 977 - ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE)) 598 + ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE && 599 + ext.property != DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE && 600 + ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP && 601 + ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY)) 978 602 return -EINVAL; 979 603 980 604 idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs)); 981 605 if (!exec_queue_set_property_funcs[idx]) 982 606 return -EINVAL; 983 607 608 + *properties |= BIT_ULL(idx); 609 + err = exec_queue_user_ext_check(q, *properties); 610 + if (XE_IOCTL_DBG(xe, err)) 611 + return err; 612 + 984 613 return exec_queue_set_property_funcs[idx](xe, q, ext.value); 985 614 } 986 615 987 616 typedef int (*xe_exec_queue_user_extension_fn)(struct xe_device *xe, 988 617 struct xe_exec_queue *q, 989 - u64 extension); 618 + u64 extension, u64 *properties); 990 619 991 620 static const xe_exec_queue_user_extension_fn exec_queue_user_extension_funcs[] = { 992 621 [DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY] = exec_queue_user_ext_set_property, 993 622 }; 994 623 995 624 #define MAX_USER_EXTENSIONS 16 996 - static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q, 997 - u64 extensions, int ext_number) 625 + static int __exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q, 626 + u64 extensions, int ext_number, u64 *properties) 998 627 { 999 628 u64 __user *address = u64_to_user_ptr(extensions); 1000 629 struct drm_xe_user_extension ext; ··· 1023 636 1024 637 idx = array_index_nospec(ext.name, 1025 638 ARRAY_SIZE(exec_queue_user_extension_funcs)); 1026 - err = exec_queue_user_extension_funcs[idx](xe, q, extensions); 639 + err = exec_queue_user_extension_funcs[idx](xe, q, extensions, properties); 1027 640 if (XE_IOCTL_DBG(xe, err)) 1028 641 return err; 1029 642 1030 643 if (ext.next_extension) 1031 - return exec_queue_user_extensions(xe, q, ext.next_extension, 1032 - ++ext_number); 644 + return __exec_queue_user_extensions(xe, q, ext.next_extension, 645 + ++ext_number, properties); 646 + 647 + return 0; 648 + } 649 + 650 + static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q, 651 + u64 extensions) 652 + { 653 + u64 properties = 0; 654 + int err; 655 + 656 + err = __exec_queue_user_extensions(xe, q, extensions, 0, &properties); 657 + if (XE_IOCTL_DBG(xe, err)) 658 + return err; 659 + 660 + err = exec_queue_user_ext_check_final(q, properties); 661 + if (XE_IOCTL_DBG(xe, err)) 662 + return err; 663 + 664 + if (xe_exec_queue_is_multi_queue_primary(q)) { 665 + err = xe_exec_queue_group_init(xe, q); 666 + if (XE_IOCTL_DBG(xe, err)) 667 + return err; 668 + } 1033 669 1034 670 return 0; 1035 671 } ··· 1208 798 if (IS_ERR(q)) 1209 799 return PTR_ERR(q); 1210 800 801 + if (xe_exec_queue_is_multi_queue_secondary(q)) { 802 + err = xe_exec_queue_group_add(xe, q); 803 + if (XE_IOCTL_DBG(xe, err)) 804 + goto put_exec_queue; 805 + } 806 + 1211 807 if (xe_vm_in_preempt_fence_mode(vm)) { 1212 808 q->lr.context = dma_fence_context_alloc(1); 1213 809 1214 810 err = xe_vm_add_compute_exec_queue(vm, q); 1215 811 if (XE_IOCTL_DBG(xe, err)) 1216 - goto put_exec_queue; 812 + goto delete_queue_group; 1217 813 } 1218 814 1219 815 if (q->vm && q->hwe->hw_engine_group) { ··· 1242 826 1243 827 kill_exec_queue: 1244 828 xe_exec_queue_kill(q); 829 + delete_queue_group: 830 + if (xe_exec_queue_is_multi_queue_secondary(q)) 831 + xe_exec_queue_group_delete(xe, q); 1245 832 put_exec_queue: 1246 833 xe_exec_queue_put(q); 1247 834 return err; ··· 1400 981 1401 982 q->ops->kill(q); 1402 983 xe_vm_remove_compute_exec_queue(q->vm, q); 984 + 985 + if (!xe_exec_queue_is_multi_queue_primary(q) && q->multi_queue.keep_active) { 986 + xe_exec_queue_group_kill_put(q->multi_queue.group); 987 + q->multi_queue.keep_active = false; 988 + } 1403 989 } 1404 990 1405 991 int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data, ··· 1431 1007 if (q->vm && q->hwe->hw_engine_group) 1432 1008 xe_hw_engine_group_del_exec_queue(q->hwe->hw_engine_group, q); 1433 1009 1434 - xe_exec_queue_kill(q); 1010 + if (xe_exec_queue_is_multi_queue_primary(q)) 1011 + xe_exec_queue_group_kill_put(q->multi_queue.group); 1012 + else 1013 + xe_exec_queue_kill(q); 1435 1014 1436 1015 trace_xe_exec_queue_close(q); 1437 1016 xe_exec_queue_put(q);
+68
drivers/gpu/drm/xe/xe_exec_queue.h
··· 66 66 return q->pxp.type; 67 67 } 68 68 69 + /** 70 + * xe_exec_queue_is_multi_queue() - Whether an exec_queue is part of a queue group. 71 + * @q: The exec_queue 72 + * 73 + * Return: True if the exec_queue is part of a queue group, false otherwise. 74 + */ 75 + static inline bool xe_exec_queue_is_multi_queue(struct xe_exec_queue *q) 76 + { 77 + return q->multi_queue.valid; 78 + } 79 + 80 + /** 81 + * xe_exec_queue_is_multi_queue_primary() - Whether an exec_queue is primary queue 82 + * of a multi queue group. 83 + * @q: The exec_queue 84 + * 85 + * Return: True if @q is primary queue of a queue group, false otherwise. 86 + */ 87 + static inline bool xe_exec_queue_is_multi_queue_primary(struct xe_exec_queue *q) 88 + { 89 + return q->multi_queue.is_primary; 90 + } 91 + 92 + /** 93 + * xe_exec_queue_is_multi_queue_secondary() - Whether an exec_queue is secondary queue 94 + * of a multi queue group. 95 + * @q: The exec_queue 96 + * 97 + * Return: True if @q is secondary queue of a queue group, false otherwise. 98 + */ 99 + static inline bool xe_exec_queue_is_multi_queue_secondary(struct xe_exec_queue *q) 100 + { 101 + return xe_exec_queue_is_multi_queue(q) && !xe_exec_queue_is_multi_queue_primary(q); 102 + } 103 + 104 + /** 105 + * xe_exec_queue_multi_queue_primary() - Get multi queue group's primary queue 106 + * @q: The exec_queue 107 + * 108 + * If @q belongs to a multi queue group, then the primary queue of the group will 109 + * be returned. Otherwise, @q will be returned. 110 + */ 111 + static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_exec_queue *q) 112 + { 113 + return xe_exec_queue_is_multi_queue(q) ? q->multi_queue.group->primary : q; 114 + } 115 + 116 + void xe_exec_queue_group_kill_put(struct xe_exec_queue_group *group); 117 + 69 118 bool xe_exec_queue_is_lr(struct xe_exec_queue *q); 70 119 71 120 bool xe_exec_queue_is_idle(struct xe_exec_queue *q); ··· 126 77 int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data, 127 78 struct drm_file *file); 128 79 int xe_exec_queue_get_property_ioctl(struct drm_device *dev, void *data, 80 + struct drm_file *file); 81 + int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data, 129 82 struct drm_file *file); 130 83 enum xe_exec_queue_priority xe_exec_queue_device_get_max_priority(struct xe_device *xe); 131 84 ··· 161 110 int xe_exec_queue_contexts_hwsp_rebase(struct xe_exec_queue *q, void *scratch); 162 111 163 112 struct xe_lrc *xe_exec_queue_lrc(struct xe_exec_queue *q); 113 + 114 + /** 115 + * xe_exec_queue_idle_skip_suspend() - Can exec queue skip suspend 116 + * @q: The exec_queue 117 + * 118 + * If an exec queue is not parallel and is idle, the suspend steps can be 119 + * skipped in the submission backend immediatley signaling the suspend fence. 120 + * Parallel queues cannot skip this step due to limitations in the submission 121 + * backend. 122 + * 123 + * Return: True if exec queue is idle and can skip suspend steps, False 124 + * otherwise 125 + */ 126 + static inline bool xe_exec_queue_idle_skip_suspend(struct xe_exec_queue *q) 127 + { 128 + return !xe_exec_queue_is_parallel(q) && xe_exec_queue_is_idle(q); 129 + } 164 130 165 131 #endif
+62
drivers/gpu/drm/xe/xe_exec_queue_types.h
··· 33 33 }; 34 34 35 35 /** 36 + * enum xe_multi_queue_priority - Multi Queue priority values 37 + * 38 + * The priority values of the queues within the multi queue group. 39 + */ 40 + enum xe_multi_queue_priority { 41 + /** @XE_MULTI_QUEUE_PRIORITY_LOW: Priority low */ 42 + XE_MULTI_QUEUE_PRIORITY_LOW = 0, 43 + /** @XE_MULTI_QUEUE_PRIORITY_NORMAL: Priority normal */ 44 + XE_MULTI_QUEUE_PRIORITY_NORMAL, 45 + /** @XE_MULTI_QUEUE_PRIORITY_HIGH: Priority high */ 46 + XE_MULTI_QUEUE_PRIORITY_HIGH, 47 + }; 48 + 49 + /** 50 + * struct xe_exec_queue_group - Execution multi queue group 51 + * 52 + * Contains multi queue group information. 53 + */ 54 + struct xe_exec_queue_group { 55 + /** @primary: Primary queue of this group */ 56 + struct xe_exec_queue *primary; 57 + /** @cgp_bo: BO for the Context Group Page */ 58 + struct xe_bo *cgp_bo; 59 + /** @xa: xarray to store LRCs */ 60 + struct xarray xa; 61 + /** @list: List of all secondary queues in the group */ 62 + struct list_head list; 63 + /** @list_lock: Secondary queue list lock */ 64 + struct mutex list_lock; 65 + /** @kill_refcount: ref count to kill primary queue */ 66 + struct kref kill_refcount; 67 + /** @sync_pending: CGP_SYNC_DONE g2h response pending */ 68 + bool sync_pending; 69 + /** @banned: Group banned */ 70 + bool banned; 71 + }; 72 + 73 + /** 36 74 * struct xe_exec_queue - Execution queue 37 75 * 38 76 * Contains all state necessary for submissions. Can either be a user object or ··· 149 111 struct xe_guc_exec_queue *guc; 150 112 }; 151 113 114 + /** @multi_queue: Multi queue information */ 115 + struct { 116 + /** @multi_queue.group: Queue group information */ 117 + struct xe_exec_queue_group *group; 118 + /** @multi_queue.link: Link into group's secondary queues list */ 119 + struct list_head link; 120 + /** @multi_queue.priority: Queue priority within the multi-queue group */ 121 + enum xe_multi_queue_priority priority; 122 + /** @multi_queue.pos: Position of queue within the multi-queue group */ 123 + u8 pos; 124 + /** @multi_queue.valid: Queue belongs to a multi queue group */ 125 + u8 valid:1; 126 + /** @multi_queue.is_primary: Is primary queue (Q0) of the group */ 127 + u8 is_primary:1; 128 + /** @multi_queue.keep_active: Keep the group active after primary is destroyed */ 129 + u8 keep_active:1; 130 + } multi_queue; 131 + 152 132 /** @sched_props: scheduling properties */ 153 133 struct { 154 134 /** @sched_props.timeslice_us: timeslice period in micro-seconds */ ··· 223 167 /** @ufence_timeline_value: User fence timeline value */ 224 168 u64 ufence_timeline_value; 225 169 170 + /** @replay_state: GPU hang replay state */ 171 + void *replay_state; 172 + 226 173 /** @ops: submission backend exec queue operations */ 227 174 const struct xe_exec_queue_ops *ops; 228 175 ··· 272 213 int (*set_timeslice)(struct xe_exec_queue *q, u32 timeslice_us); 273 214 /** @set_preempt_timeout: Set preemption timeout for exec queue */ 274 215 int (*set_preempt_timeout)(struct xe_exec_queue *q, u32 preempt_timeout_us); 216 + /** @set_multi_queue_priority: Set multi queue priority */ 217 + int (*set_multi_queue_priority)(struct xe_exec_queue *q, 218 + enum xe_multi_queue_priority priority); 275 219 /** 276 220 * @suspend: Suspend exec queue from executing, allowed to be called 277 221 * multiple times in a row before resume with the caveat that
+1 -1
drivers/gpu/drm/xe/xe_execlist.c
··· 269 269 270 270 port->hwe = hwe; 271 271 272 - port->lrc = xe_lrc_create(hwe, NULL, SZ_16K, XE_IRQ_DEFAULT_MSIX, 0); 272 + port->lrc = xe_lrc_create(hwe, NULL, NULL, SZ_16K, XE_IRQ_DEFAULT_MSIX, 0); 273 273 if (IS_ERR(port->lrc)) { 274 274 err = PTR_ERR(port->lrc); 275 275 goto err;
+7
drivers/gpu/drm/xe/xe_force_wake.c
··· 166 166 * xe_force_wake_ref_has_domain() function. Caller must call 167 167 * xe_force_wake_put() function to decrease incremented refcounts. 168 168 * 169 + * When possible, scope-based forcewake (through CLASS(xe_force_wake, ...) or 170 + * xe_with_force_wake()) should be used instead of direct calls to this 171 + * function. Direct usage of get/put should only be used when the function 172 + * has goto-based flows that can interfere with scope-based cleanup, or when 173 + * the lifetime of the forcewake reference does not match a specific scope 174 + * (e.g., forcewake obtained in one function and released in a different one). 175 + * 169 176 * Return: opaque reference to woken domains or zero if none of requested 170 177 * domains were awake. 171 178 */
+40
drivers/gpu/drm/xe/xe_force_wake.h
··· 61 61 return fw_ref & domain; 62 62 } 63 63 64 + struct xe_force_wake_ref { 65 + struct xe_force_wake *fw; 66 + unsigned int domains; 67 + }; 68 + 69 + static struct xe_force_wake_ref 70 + xe_force_wake_constructor(struct xe_force_wake *fw, unsigned int domains) 71 + { 72 + struct xe_force_wake_ref fw_ref = { .fw = fw }; 73 + 74 + fw_ref.domains = xe_force_wake_get(fw, domains); 75 + 76 + return fw_ref; 77 + } 78 + 79 + DEFINE_CLASS(xe_force_wake, struct xe_force_wake_ref, 80 + xe_force_wake_put(_T.fw, _T.domains), 81 + xe_force_wake_constructor(fw, domains), 82 + struct xe_force_wake *fw, unsigned int domains); 83 + 84 + /* 85 + * Scoped helper for the forcewake class, using the same trick as scoped_guard() 86 + * to bind the lifetime to the next statement/block. 87 + */ 88 + #define __xe_with_force_wake(ref, fw, domains, done) \ 89 + for (CLASS(xe_force_wake, ref)(fw, domains), *(done) = NULL; \ 90 + !(done); (done) = (void *)1) 91 + 92 + #define xe_with_force_wake(ref, fw, domains) \ 93 + __xe_with_force_wake(ref, fw, domains, __UNIQUE_ID(done)) 94 + 95 + /* 96 + * Used when xe_force_wake_constructor() has already been called by another 97 + * function and the current function is responsible for releasing the forcewake 98 + * reference in all possible cases and error paths. 99 + */ 100 + DEFINE_CLASS(xe_force_wake_release_only, struct xe_force_wake_ref, 101 + if (_T.fw) xe_force_wake_put(_T.fw, _T.domains), fw_ref, 102 + struct xe_force_wake_ref fw_ref); 103 + 64 104 #endif
+1 -2
drivers/gpu/drm/xe/xe_ggtt.c
··· 396 396 delayed_removal_work); 397 397 struct xe_device *xe = tile_to_xe(node->ggtt->tile); 398 398 399 - xe_pm_runtime_get(xe); 399 + guard(xe_pm_runtime)(xe); 400 400 ggtt_node_remove(node); 401 - xe_pm_runtime_put(xe); 402 401 } 403 402 404 403 /**
+6 -15
drivers/gpu/drm/xe/xe_gsc.c
··· 352 352 struct xe_gsc *gsc = container_of(work, typeof(*gsc), work); 353 353 struct xe_gt *gt = gsc_to_gt(gsc); 354 354 struct xe_device *xe = gt_to_xe(gt); 355 - unsigned int fw_ref; 356 355 u32 actions; 357 356 int ret; 358 357 ··· 360 361 gsc->work_actions = 0; 361 362 spin_unlock_irq(&gsc->lock); 362 363 363 - xe_pm_runtime_get(xe); 364 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GSC); 364 + guard(xe_pm_runtime)(xe); 365 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GSC); 365 366 366 367 if (actions & GSC_ACTION_ER_COMPLETE) { 367 - ret = gsc_er_complete(gt); 368 - if (ret) 369 - goto out; 368 + if (gsc_er_complete(gt)) 369 + return; 370 370 } 371 371 372 372 if (actions & GSC_ACTION_FW_LOAD) { ··· 378 380 379 381 if (actions & GSC_ACTION_SW_PROXY) 380 382 xe_gsc_proxy_request_handler(gsc); 381 - 382 - out: 383 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 384 - xe_pm_runtime_put(xe); 385 383 } 386 384 387 385 void xe_gsc_hwe_irq_handler(struct xe_hw_engine *hwe, u16 intr_vec) ··· 609 615 { 610 616 struct xe_gt *gt = gsc_to_gt(gsc); 611 617 struct xe_mmio *mmio = &gt->mmio; 612 - unsigned int fw_ref; 613 618 614 619 xe_uc_fw_print(&gsc->fw, p); 615 620 ··· 617 624 if (!xe_uc_fw_is_enabled(&gsc->fw)) 618 625 return; 619 626 620 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GSC); 621 - if (!fw_ref) 627 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GSC); 628 + if (!fw_ref.domains) 622 629 return; 623 630 624 631 drm_printf(p, "\nHECI1 FWSTS: 0x%08x 0x%08x 0x%08x 0x%08x 0x%08x 0x%08x\n", ··· 628 635 xe_mmio_read32(mmio, HECI_FWSTS4(MTL_GSC_HECI1_BASE)), 629 636 xe_mmio_read32(mmio, HECI_FWSTS5(MTL_GSC_HECI1_BASE)), 630 637 xe_mmio_read32(mmio, HECI_FWSTS6(MTL_GSC_HECI1_BASE))); 631 - 632 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 633 638 }
+1 -2
drivers/gpu/drm/xe/xe_gsc_debugfs.c
··· 37 37 struct xe_device *xe = gsc_to_xe(gsc); 38 38 struct drm_printer p = drm_seq_file_printer(m); 39 39 40 - xe_pm_runtime_get(xe); 40 + guard(xe_pm_runtime)(xe); 41 41 xe_gsc_print_info(gsc, &p); 42 - xe_pm_runtime_put(xe); 43 42 44 43 return 0; 45 44 }
+7 -10
drivers/gpu/drm/xe/xe_gsc_proxy.c
··· 440 440 struct xe_gsc *gsc = arg; 441 441 struct xe_gt *gt = gsc_to_gt(gsc); 442 442 struct xe_device *xe = gt_to_xe(gt); 443 - unsigned int fw_ref = 0; 444 443 445 444 if (!gsc->proxy.component_added) 446 445 return; 447 446 448 447 /* disable HECI2 IRQs */ 449 - xe_pm_runtime_get(xe); 450 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GSC); 451 - if (!fw_ref) 452 - xe_gt_err(gt, "failed to get forcewake to disable GSC interrupts\n"); 448 + scoped_guard(xe_pm_runtime, xe) { 449 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GSC); 450 + if (!fw_ref.domains) 451 + xe_gt_err(gt, "failed to get forcewake to disable GSC interrupts\n"); 453 452 454 - /* try do disable irq even if forcewake failed */ 455 - gsc_proxy_irq_toggle(gsc, false); 456 - 457 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 458 - xe_pm_runtime_put(xe); 453 + /* try do disable irq even if forcewake failed */ 454 + gsc_proxy_irq_toggle(gsc, false); 455 + } 459 456 460 457 xe_gsc_wait_for_worker_completion(gsc); 461 458
+87 -87
drivers/gpu/drm/xe/xe_gt.c
··· 103 103 104 104 static void xe_gt_enable_host_l2_vram(struct xe_gt *gt) 105 105 { 106 - unsigned int fw_ref; 107 106 u32 reg; 108 107 109 108 if (!XE_GT_WA(gt, 16023588340)) 110 109 return; 111 110 112 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 113 - if (!fw_ref) 111 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 112 + if (!fw_ref.domains) 114 113 return; 115 114 116 115 if (xe_gt_is_main_type(gt)) { ··· 119 120 } 120 121 121 122 xe_gt_mcr_multicast_write(gt, XEHPC_L3CLOS_MASK(3), 0xF); 122 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 123 123 } 124 124 125 125 static void xe_gt_disable_host_l2_vram(struct xe_gt *gt) 126 126 { 127 - unsigned int fw_ref; 128 127 u32 reg; 129 128 130 129 if (!XE_GT_WA(gt, 16023588340)) ··· 131 134 if (xe_gt_is_media_type(gt)) 132 135 return; 133 136 134 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 135 - if (!fw_ref) 137 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 138 + if (!fw_ref.domains) 136 139 return; 137 140 138 141 reg = xe_gt_mcr_unicast_read_any(gt, XE2_GAMREQSTRM_CTRL); 139 142 reg &= ~CG_DIS_CNTLBUS; 140 143 xe_gt_mcr_multicast_write(gt, XE2_GAMREQSTRM_CTRL, reg); 141 - 142 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 143 144 } 144 145 145 146 static void gt_reset_worker(struct work_struct *w); ··· 384 389 385 390 int xe_gt_init_early(struct xe_gt *gt) 386 391 { 387 - unsigned int fw_ref; 388 392 int err; 389 393 390 394 if (IS_SRIOV_PF(gt_to_xe(gt))) { ··· 430 436 if (err) 431 437 return err; 432 438 433 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 434 - if (!fw_ref) 439 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 440 + if (!fw_ref.domains) 435 441 return -ETIMEDOUT; 436 442 437 443 xe_gt_mcr_init_early(gt); 438 444 xe_pat_init(gt); 439 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 440 445 441 446 return 0; 442 447 } ··· 453 460 454 461 static int gt_init_with_gt_forcewake(struct xe_gt *gt) 455 462 { 456 - unsigned int fw_ref; 457 463 int err; 458 464 459 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 460 - if (!fw_ref) 465 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 466 + if (!fw_ref.domains) 461 467 return -ETIMEDOUT; 462 468 463 469 err = xe_uc_init(&gt->uc); 464 470 if (err) 465 - goto err_force_wake; 471 + return err; 466 472 467 473 xe_gt_topology_init(gt); 468 474 xe_gt_mcr_init(gt); ··· 470 478 if (xe_gt_is_main_type(gt)) { 471 479 err = xe_ggtt_init(gt_to_tile(gt)->mem.ggtt); 472 480 if (err) 473 - goto err_force_wake; 481 + return err; 474 482 if (IS_SRIOV_PF(gt_to_xe(gt))) 475 483 xe_lmtt_init(&gt_to_tile(gt)->sriov.pf.lmtt); 476 484 } ··· 484 492 err = xe_hw_engines_init_early(gt); 485 493 if (err) { 486 494 dump_pat_on_error(gt); 487 - goto err_force_wake; 495 + return err; 488 496 } 489 497 490 498 err = xe_hw_engine_class_sysfs_init(gt); 491 499 if (err) 492 - goto err_force_wake; 500 + return err; 493 501 494 502 /* Initialize CCS mode sysfs after early initialization of HW engines */ 495 503 err = xe_gt_ccs_mode_sysfs_init(gt); 496 504 if (err) 497 - goto err_force_wake; 505 + return err; 498 506 499 507 /* 500 508 * Stash hardware-reported version. Since this register does not exist ··· 502 510 */ 503 511 gt->info.gmdid = xe_mmio_read32(&gt->mmio, GMD_ID); 504 512 505 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 506 513 return 0; 507 - 508 - err_force_wake: 509 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 510 - 511 - return err; 512 514 } 513 515 514 516 static int gt_init_with_all_forcewake(struct xe_gt *gt) 515 517 { 516 - unsigned int fw_ref; 517 518 int err; 518 519 519 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); 520 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) { 521 - err = -ETIMEDOUT; 522 - goto err_force_wake; 523 - } 520 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL); 521 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)) 522 + return -ETIMEDOUT; 524 523 525 524 xe_gt_mcr_set_implicit_defaults(gt); 526 525 xe_wa_process_gt(gt); ··· 520 537 521 538 err = xe_gt_clock_init(gt); 522 539 if (err) 523 - goto err_force_wake; 540 + return err; 524 541 525 542 xe_mocs_init(gt); 526 543 err = xe_execlist_init(gt); 527 544 if (err) 528 - goto err_force_wake; 545 + return err; 529 546 530 547 err = xe_hw_engines_init(gt); 531 548 if (err) 532 - goto err_force_wake; 549 + return err; 533 550 534 551 err = xe_uc_init_post_hwconfig(&gt->uc); 535 552 if (err) 536 - goto err_force_wake; 553 + return err; 537 554 538 555 if (xe_gt_is_main_type(gt)) { 539 556 /* ··· 544 561 545 562 gt->usm.bb_pool = xe_sa_bo_manager_init(gt_to_tile(gt), 546 563 IS_DGFX(xe) ? SZ_1M : SZ_512K, 16); 547 - if (IS_ERR(gt->usm.bb_pool)) { 548 - err = PTR_ERR(gt->usm.bb_pool); 549 - goto err_force_wake; 550 - } 564 + if (IS_ERR(gt->usm.bb_pool)) 565 + return PTR_ERR(gt->usm.bb_pool); 551 566 } 552 567 } 553 568 ··· 554 573 555 574 err = xe_migrate_init(tile->migrate); 556 575 if (err) 557 - goto err_force_wake; 576 + return err; 558 577 } 559 578 560 579 err = xe_uc_load_hw(&gt->uc); 561 580 if (err) 562 - goto err_force_wake; 581 + return err; 563 582 564 583 /* Configure default CCS mode of 1 engine with all resources */ 565 584 if (xe_gt_ccs_mode_enabled(gt)) { ··· 573 592 if (IS_SRIOV_PF(gt_to_xe(gt))) 574 593 xe_gt_sriov_pf_init_hw(gt); 575 594 576 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 577 - 578 595 return 0; 579 - 580 - err_force_wake: 581 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 582 - 583 - return err; 584 596 } 585 597 586 598 static void xe_gt_fini(void *arg) ··· 876 902 877 903 void xe_gt_suspend_prepare(struct xe_gt *gt) 878 904 { 879 - unsigned int fw_ref; 880 - 881 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); 882 - 905 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL); 883 906 xe_uc_suspend_prepare(&gt->uc); 884 - 885 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 886 907 } 887 908 888 909 int xe_gt_suspend(struct xe_gt *gt) 889 910 { 890 - unsigned int fw_ref; 891 911 int err; 892 912 893 913 xe_gt_dbg(gt, "suspending\n"); 894 914 xe_gt_sanitize(gt); 895 915 896 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); 897 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) 898 - goto err_msg; 916 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL); 917 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)) { 918 + xe_gt_err(gt, "suspend failed (%pe)\n", ERR_PTR(-ETIMEDOUT)); 919 + return -ETIMEDOUT; 920 + } 899 921 900 922 err = xe_uc_suspend(&gt->uc); 901 - if (err) 902 - goto err_force_wake; 923 + if (err) { 924 + xe_gt_err(gt, "suspend failed (%pe)\n", ERR_PTR(err)); 925 + return err; 926 + } 903 927 904 928 xe_gt_idle_disable_pg(gt); 905 929 906 930 xe_gt_disable_host_l2_vram(gt); 907 931 908 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 909 932 xe_gt_dbg(gt, "suspended\n"); 910 933 911 934 return 0; 912 - 913 - err_msg: 914 - err = -ETIMEDOUT; 915 - err_force_wake: 916 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 917 - xe_gt_err(gt, "suspend failed (%pe)\n", ERR_PTR(err)); 918 - 919 - return err; 920 935 } 921 936 922 937 void xe_gt_shutdown(struct xe_gt *gt) 923 938 { 924 - unsigned int fw_ref; 925 - 926 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); 939 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL); 927 940 do_gt_reset(gt); 928 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 929 941 } 930 942 931 943 /** ··· 936 976 937 977 int xe_gt_resume(struct xe_gt *gt) 938 978 { 939 - unsigned int fw_ref; 940 979 int err; 941 980 942 981 xe_gt_dbg(gt, "resuming\n"); 943 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); 944 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) 945 - goto err_msg; 982 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL); 983 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)) { 984 + xe_gt_err(gt, "resume failed (%pe)\n", ERR_PTR(-ETIMEDOUT)); 985 + return -ETIMEDOUT; 986 + } 946 987 947 988 err = do_gt_restart(gt); 948 989 if (err) 949 - goto err_force_wake; 990 + return err; 950 991 951 992 xe_gt_idle_enable_pg(gt); 952 993 953 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 954 994 xe_gt_dbg(gt, "resumed\n"); 955 995 956 996 return 0; 997 + } 957 998 958 - err_msg: 959 - err = -ETIMEDOUT; 960 - err_force_wake: 961 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 962 - xe_gt_err(gt, "resume failed (%pe)\n", ERR_PTR(err)); 999 + /** 1000 + * xe_gt_runtime_suspend() - GT runtime suspend 1001 + * @gt: the GT object 1002 + * 1003 + * Return: 0 on success, negative error code otherwise. 1004 + */ 1005 + int xe_gt_runtime_suspend(struct xe_gt *gt) 1006 + { 1007 + xe_gt_dbg(gt, "runtime suspending\n"); 963 1008 964 - return err; 1009 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL); 1010 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)) { 1011 + xe_gt_err(gt, "runtime suspend failed (%pe)\n", ERR_PTR(-ETIMEDOUT)); 1012 + return -ETIMEDOUT; 1013 + } 1014 + 1015 + xe_uc_runtime_suspend(&gt->uc); 1016 + xe_gt_disable_host_l2_vram(gt); 1017 + 1018 + xe_gt_dbg(gt, "runtime suspended\n"); 1019 + 1020 + return 0; 1021 + } 1022 + 1023 + /** 1024 + * xe_gt_runtime_resume() - GT runtime resume 1025 + * @gt: the GT object 1026 + * 1027 + * Return: 0 on success, negative error code otherwise. 1028 + */ 1029 + int xe_gt_runtime_resume(struct xe_gt *gt) 1030 + { 1031 + xe_gt_dbg(gt, "runtime resuming\n"); 1032 + 1033 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL); 1034 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)) { 1035 + xe_gt_err(gt, "runtime resume failed (%pe)\n", ERR_PTR(-ETIMEDOUT)); 1036 + return -ETIMEDOUT; 1037 + } 1038 + 1039 + xe_gt_enable_host_l2_vram(gt); 1040 + xe_uc_runtime_resume(&gt->uc); 1041 + 1042 + xe_gt_dbg(gt, "runtime resumed\n"); 1043 + 1044 + return 0; 965 1045 } 966 1046 967 1047 struct xe_hw_engine *xe_gt_hw_engine(struct xe_gt *gt,
+2
drivers/gpu/drm/xe/xe_gt.h
··· 58 58 void xe_gt_shutdown(struct xe_gt *gt); 59 59 int xe_gt_resume(struct xe_gt *gt); 60 60 void xe_gt_reset_async(struct xe_gt *gt); 61 + int xe_gt_runtime_resume(struct xe_gt *gt); 62 + int xe_gt_runtime_suspend(struct xe_gt *gt); 61 63 void xe_gt_sanitize(struct xe_gt *gt); 62 64 int xe_gt_sanitize_freq(struct xe_gt *gt); 63 65
+9 -21
drivers/gpu/drm/xe/xe_gt_debugfs.c
··· 105 105 struct drm_info_node *node = m->private; 106 106 struct xe_gt *gt = node_to_gt(node); 107 107 struct xe_device *xe = gt_to_xe(gt); 108 - int ret; 109 108 110 - xe_pm_runtime_get(xe); 111 - ret = xe_gt_debugfs_simple_show(m, data); 112 - xe_pm_runtime_put(xe); 113 - 114 - return ret; 109 + guard(xe_pm_runtime)(xe); 110 + return xe_gt_debugfs_simple_show(m, data); 115 111 } 116 112 117 113 static int hw_engines(struct xe_gt *gt, struct drm_printer *p) 118 114 { 119 115 struct xe_hw_engine *hwe; 120 116 enum xe_hw_engine_id id; 121 - unsigned int fw_ref; 122 - int ret = 0; 123 117 124 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); 125 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) { 126 - ret = -ETIMEDOUT; 127 - goto fw_put; 128 - } 118 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL); 119 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)) 120 + return -ETIMEDOUT; 129 121 130 122 for_each_hw_engine(hwe, gt, id) 131 123 xe_hw_engine_print(hwe, p); 132 124 133 - fw_put: 134 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 135 - 136 - return ret; 125 + return 0; 137 126 } 138 127 139 128 static int steering(struct xe_gt *gt, struct drm_printer *p) ··· 209 220 { "default_lrc_vcs", .show = xe_gt_debugfs_show_with_rpm, .data = vcs_default_lrc }, 210 221 { "default_lrc_vecs", .show = xe_gt_debugfs_show_with_rpm, .data = vecs_default_lrc }, 211 222 { "hwconfig", .show = xe_gt_debugfs_show_with_rpm, .data = hwconfig }, 223 + { "pat_sw_config", .show = xe_gt_debugfs_simple_show, .data = xe_pat_dump_sw_config }, 212 224 }; 213 225 214 226 /* everything else should be added here */ ··· 259 269 { 260 270 struct xe_device *xe = gt_to_xe(gt); 261 271 262 - xe_pm_runtime_get(xe); 272 + guard(xe_pm_runtime)(xe); 263 273 xe_gt_reset_async(gt); 264 - xe_pm_runtime_put(xe); 265 274 } 266 275 267 276 static ssize_t force_reset_write(struct file *file, ··· 286 297 { 287 298 struct xe_device *xe = gt_to_xe(gt); 288 299 289 - xe_pm_runtime_get(xe); 300 + guard(xe_pm_runtime)(xe); 290 301 xe_gt_reset(gt); 291 - xe_pm_runtime_put(xe); 292 302 } 293 303 294 304 static ssize_t force_reset_sync_write(struct file *file,
+9 -18
drivers/gpu/drm/xe/xe_gt_freq.c
··· 70 70 struct xe_guc_pc *pc = dev_to_pc(dev); 71 71 u32 freq; 72 72 73 - xe_pm_runtime_get(dev_to_xe(dev)); 73 + guard(xe_pm_runtime)(dev_to_xe(dev)); 74 74 freq = xe_guc_pc_get_act_freq(pc); 75 - xe_pm_runtime_put(dev_to_xe(dev)); 76 75 77 76 return sysfs_emit(buf, "%d\n", freq); 78 77 } ··· 85 86 u32 freq; 86 87 ssize_t ret; 87 88 88 - xe_pm_runtime_get(dev_to_xe(dev)); 89 + guard(xe_pm_runtime)(dev_to_xe(dev)); 89 90 ret = xe_guc_pc_get_cur_freq(pc, &freq); 90 - xe_pm_runtime_put(dev_to_xe(dev)); 91 91 if (ret) 92 92 return ret; 93 93 ··· 111 113 struct xe_guc_pc *pc = dev_to_pc(dev); 112 114 u32 freq; 113 115 114 - xe_pm_runtime_get(dev_to_xe(dev)); 116 + guard(xe_pm_runtime)(dev_to_xe(dev)); 115 117 freq = xe_guc_pc_get_rpe_freq(pc); 116 - xe_pm_runtime_put(dev_to_xe(dev)); 117 118 118 119 return sysfs_emit(buf, "%d\n", freq); 119 120 } ··· 125 128 struct xe_guc_pc *pc = dev_to_pc(dev); 126 129 u32 freq; 127 130 128 - xe_pm_runtime_get(dev_to_xe(dev)); 131 + guard(xe_pm_runtime)(dev_to_xe(dev)); 129 132 freq = xe_guc_pc_get_rpa_freq(pc); 130 - xe_pm_runtime_put(dev_to_xe(dev)); 131 133 132 134 return sysfs_emit(buf, "%d\n", freq); 133 135 } ··· 150 154 u32 freq; 151 155 ssize_t ret; 152 156 153 - xe_pm_runtime_get(dev_to_xe(dev)); 157 + guard(xe_pm_runtime)(dev_to_xe(dev)); 154 158 ret = xe_guc_pc_get_min_freq(pc, &freq); 155 - xe_pm_runtime_put(dev_to_xe(dev)); 156 159 if (ret) 157 160 return ret; 158 161 ··· 170 175 if (ret) 171 176 return ret; 172 177 173 - xe_pm_runtime_get(dev_to_xe(dev)); 178 + guard(xe_pm_runtime)(dev_to_xe(dev)); 174 179 ret = xe_guc_pc_set_min_freq(pc, freq); 175 - xe_pm_runtime_put(dev_to_xe(dev)); 176 180 if (ret) 177 181 return ret; 178 182 ··· 187 193 u32 freq; 188 194 ssize_t ret; 189 195 190 - xe_pm_runtime_get(dev_to_xe(dev)); 196 + guard(xe_pm_runtime)(dev_to_xe(dev)); 191 197 ret = xe_guc_pc_get_max_freq(pc, &freq); 192 - xe_pm_runtime_put(dev_to_xe(dev)); 193 198 if (ret) 194 199 return ret; 195 200 ··· 207 214 if (ret) 208 215 return ret; 209 216 210 - xe_pm_runtime_get(dev_to_xe(dev)); 217 + guard(xe_pm_runtime)(dev_to_xe(dev)); 211 218 ret = xe_guc_pc_set_max_freq(pc, freq); 212 - xe_pm_runtime_put(dev_to_xe(dev)); 213 219 if (ret) 214 220 return ret; 215 221 ··· 235 243 struct xe_guc_pc *pc = dev_to_pc(dev); 236 244 int err; 237 245 238 - xe_pm_runtime_get(dev_to_xe(dev)); 246 + guard(xe_pm_runtime)(dev_to_xe(dev)); 239 247 err = xe_guc_pc_set_power_profile(pc, buff); 240 - xe_pm_runtime_put(dev_to_xe(dev)); 241 248 242 249 return err ?: count; 243 250 }
+12 -29
drivers/gpu/drm/xe/xe_gt_idle.c
··· 105 105 struct xe_gt_idle *gtidle = &gt->gtidle; 106 106 struct xe_mmio *mmio = &gt->mmio; 107 107 u32 vcs_mask, vecs_mask; 108 - unsigned int fw_ref; 109 108 int i, j; 110 109 111 110 if (IS_SRIOV_VF(xe)) ··· 136 137 } 137 138 } 138 139 139 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 140 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 140 141 if (xe->info.skip_guc_pc) { 141 142 /* 142 143 * GuC sets the hysteresis value when GuC PC is enabled ··· 153 154 VDN_MFXVDENC_POWERGATE_ENABLE(2)); 154 155 155 156 xe_mmio_write32(mmio, POWERGATE_ENABLE, gtidle->powergate_enable); 156 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 157 157 } 158 158 159 159 void xe_gt_idle_disable_pg(struct xe_gt *gt) 160 160 { 161 161 struct xe_gt_idle *gtidle = &gt->gtidle; 162 - unsigned int fw_ref; 163 162 164 163 if (IS_SRIOV_VF(gt_to_xe(gt))) 165 164 return; ··· 165 168 xe_device_assert_mem_access(gt_to_xe(gt)); 166 169 gtidle->powergate_enable = 0; 167 170 168 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 171 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 169 172 xe_mmio_write32(&gt->mmio, POWERGATE_ENABLE, gtidle->powergate_enable); 170 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 171 173 } 172 174 173 175 /** ··· 185 189 enum xe_gt_idle_state state; 186 190 u32 pg_enabled, pg_status = 0; 187 191 u32 vcs_mask, vecs_mask; 188 - unsigned int fw_ref; 189 192 int n; 190 193 /* 191 194 * Media Slices ··· 221 226 222 227 /* Do not wake the GT to read powergating status */ 223 228 if (state != GT_IDLE_C6) { 224 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 225 - if (!fw_ref) 229 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 230 + if (!fw_ref.domains) 226 231 return -ETIMEDOUT; 227 232 228 233 pg_enabled = xe_mmio_read32(&gt->mmio, POWERGATE_ENABLE); 229 234 pg_status = xe_mmio_read32(&gt->mmio, POWERGATE_DOMAIN_STATUS); 230 - 231 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 232 235 } 233 236 234 237 if (gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK) { ··· 264 271 struct device *dev = kobj_to_dev(kobj); 265 272 struct xe_gt_idle *gtidle = dev_to_gtidle(dev); 266 273 struct xe_guc_pc *pc = gtidle_to_pc(gtidle); 267 - ssize_t ret; 268 274 269 - xe_pm_runtime_get(pc_to_xe(pc)); 270 - ret = sysfs_emit(buff, "%s\n", gtidle->name); 271 - xe_pm_runtime_put(pc_to_xe(pc)); 272 - 273 - return ret; 275 + guard(xe_pm_runtime)(pc_to_xe(pc)); 276 + return sysfs_emit(buff, "%s\n", gtidle->name); 274 277 } 275 278 static struct kobj_attribute name_attr = __ATTR_RO(name); 276 279 ··· 278 289 struct xe_guc_pc *pc = gtidle_to_pc(gtidle); 279 290 enum xe_gt_idle_state state; 280 291 281 - xe_pm_runtime_get(pc_to_xe(pc)); 282 - state = gtidle->idle_status(pc); 283 - xe_pm_runtime_put(pc_to_xe(pc)); 292 + scoped_guard(xe_pm_runtime, pc_to_xe(pc)) 293 + state = gtidle->idle_status(pc); 284 294 285 295 return sysfs_emit(buff, "%s\n", gt_idle_state_to_string(state)); 286 296 } ··· 307 319 struct xe_guc_pc *pc = gtidle_to_pc(gtidle); 308 320 u64 residency; 309 321 310 - xe_pm_runtime_get(pc_to_xe(pc)); 311 - residency = xe_gt_idle_residency_msec(gtidle); 312 - xe_pm_runtime_put(pc_to_xe(pc)); 322 + scoped_guard(xe_pm_runtime, pc_to_xe(pc)) 323 + residency = xe_gt_idle_residency_msec(gtidle); 313 324 314 325 return sysfs_emit(buff, "%llu\n", residency); 315 326 } ··· 391 404 392 405 int xe_gt_idle_disable_c6(struct xe_gt *gt) 393 406 { 394 - unsigned int fw_ref; 395 - 396 407 xe_device_assert_mem_access(gt_to_xe(gt)); 397 408 398 409 if (IS_SRIOV_VF(gt_to_xe(gt))) 399 410 return 0; 400 411 401 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 402 - if (!fw_ref) 412 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 413 + if (!fw_ref.domains) 403 414 return -ETIMEDOUT; 404 415 405 416 xe_mmio_write32(&gt->mmio, RC_CONTROL, 0); 406 417 xe_mmio_write32(&gt->mmio, RC_STATE, 0); 407 - 408 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 409 418 410 419 return 0; 411 420 }
+12 -7
drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c
··· 269 269 } 270 270 271 271 /* Return: number of configuration dwords written */ 272 - static u32 encode_config(u32 *cfg, const struct xe_gt_sriov_config *config, bool details) 272 + static u32 encode_config(struct xe_gt *gt, u32 *cfg, const struct xe_gt_sriov_config *config, 273 + bool details) 273 274 { 274 275 u32 n = 0; 275 276 ··· 304 303 cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_PREEMPT_TIMEOUT); 305 304 cfg[n++] = config->preempt_timeout; 306 305 307 - #define encode_threshold_config(TAG, ...) ({ \ 308 - cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG); \ 309 - cfg[n++] = config->thresholds[MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG)]; \ 306 + #define encode_threshold_config(TAG, NAME, VER...) ({ \ 307 + if (IF_ARGS(GUC_FIRMWARE_VER_AT_LEAST(&gt->uc.guc, VER), true, VER)) { \ 308 + cfg[n++] = PREP_GUC_KLV_TAG(VF_CFG_THRESHOLD_##TAG); \ 309 + cfg[n++] = config->thresholds[MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG)]; \ 310 + } \ 310 311 }); 311 312 312 313 MAKE_XE_GUC_KLV_THRESHOLDS_SET(encode_threshold_config); ··· 331 328 return -ENOBUFS; 332 329 333 330 cfg = xe_guc_buf_cpu_ptr(buf); 334 - num_dwords = encode_config(cfg, config, true); 331 + num_dwords = encode_config(gt, cfg, config, true); 335 332 xe_gt_assert(gt, num_dwords <= max_cfg_dwords); 336 333 337 334 if (xe_gt_is_media_type(gt)) { ··· 2521 2518 ret = -ENOBUFS; 2522 2519 } else { 2523 2520 config = pf_pick_vf_config(gt, vfid); 2524 - ret = encode_config(buf, config, false) * sizeof(u32); 2521 + ret = encode_config(gt, buf, config, false) * sizeof(u32); 2525 2522 } 2526 2523 } 2527 2524 mutex_unlock(xe_gt_sriov_pf_master_mutex(gt)); ··· 2554 2551 return pf_provision_preempt_timeout(gt, vfid, value[0]); 2555 2552 2556 2553 /* auto-generate case statements */ 2557 - #define define_threshold_key_to_provision_case(TAG, ...) \ 2554 + #define define_threshold_key_to_provision_case(TAG, NAME, VER...) \ 2558 2555 case MAKE_GUC_KLV_VF_CFG_THRESHOLD_KEY(TAG): \ 2559 2556 BUILD_BUG_ON(MAKE_GUC_KLV_VF_CFG_THRESHOLD_LEN(TAG) != 1u); \ 2560 2557 if (len != MAKE_GUC_KLV_VF_CFG_THRESHOLD_LEN(TAG)) \ 2561 2558 return -EBADMSG; \ 2559 + if (IF_ARGS(!GUC_FIRMWARE_VER_AT_LEAST(&gt->uc.guc, VER), false, VER)) \ 2560 + return -EKEYREJECTED; \ 2562 2561 return pf_provision_threshold(gt, vfid, \ 2563 2562 MAKE_XE_GUC_KLV_THRESHOLD_INDEX(TAG), \ 2564 2563 value[0]);
+10 -11
drivers/gpu/drm/xe/xe_gt_sriov_pf_debugfs.c
··· 21 21 #include "xe_gt_sriov_pf_monitor.h" 22 22 #include "xe_gt_sriov_pf_policy.h" 23 23 #include "xe_gt_sriov_pf_service.h" 24 + #include "xe_guc.h" 24 25 #include "xe_pm.h" 25 26 #include "xe_sriov_pf.h" 26 27 #include "xe_sriov_pf_provision.h" ··· 124 123 if (val > (TYPE)~0ull) \ 125 124 return -EOVERFLOW; \ 126 125 \ 127 - xe_pm_runtime_get(xe); \ 126 + guard(xe_pm_runtime)(xe); \ 128 127 err = xe_gt_sriov_pf_policy_set_##POLICY(gt, val); \ 129 128 if (!err) \ 130 129 xe_sriov_pf_provision_set_custom_mode(xe); \ 131 - xe_pm_runtime_put(xe); \ 132 130 \ 133 131 return err; \ 134 132 } \ ··· 189 189 if (val > (TYPE)~0ull) \ 190 190 return -EOVERFLOW; \ 191 191 \ 192 - xe_pm_runtime_get(xe); \ 192 + guard(xe_pm_runtime)(xe); \ 193 193 err = xe_sriov_pf_wait_ready(xe) ?: \ 194 194 xe_gt_sriov_pf_config_set_##CONFIG(gt, vfid, val); \ 195 195 if (!err) \ 196 196 xe_sriov_pf_provision_set_custom_mode(xe); \ 197 - xe_pm_runtime_put(xe); \ 198 197 \ 199 198 return err; \ 200 199 } \ ··· 248 249 if (val > (u32)~0ull) 249 250 return -EOVERFLOW; 250 251 251 - xe_pm_runtime_get(xe); 252 + guard(xe_pm_runtime)(xe); 252 253 err = xe_gt_sriov_pf_config_set_threshold(gt, vfid, index, val); 253 254 if (!err) 254 255 xe_sriov_pf_provision_set_custom_mode(xe); 255 - xe_pm_runtime_put(xe); 256 256 257 257 return err; 258 258 } ··· 302 304 &sched_priority_fops); 303 305 304 306 /* register all threshold attributes */ 305 - #define register_threshold_attribute(TAG, NAME, ...) \ 306 - debugfs_create_file_unsafe("threshold_" #NAME, 0644, parent, parent, \ 307 - &NAME##_fops); 307 + #define register_threshold_attribute(TAG, NAME, VER...) ({ \ 308 + if (IF_ARGS(GUC_FIRMWARE_VER_AT_LEAST(&gt->uc.guc, VER), true, VER)) \ 309 + debugfs_create_file_unsafe("threshold_" #NAME, 0644, parent, parent, \ 310 + &NAME##_fops); \ 311 + }); 308 312 MAKE_XE_GUC_KLV_THRESHOLDS_SET(register_threshold_attribute) 309 313 #undef register_threshold_attribute 310 314 } ··· 358 358 xe_gt_assert(gt, sizeof(cmd) > strlen(control_cmds[n].cmd)); 359 359 360 360 if (sysfs_streq(cmd, control_cmds[n].cmd)) { 361 - xe_pm_runtime_get(xe); 361 + guard(xe_pm_runtime)(xe); 362 362 ret = control_cmds[n].fn ? (*control_cmds[n].fn)(gt, vfid) : 0; 363 - xe_pm_runtime_put(xe); 364 363 break; 365 364 } 366 365 }
+1 -1
drivers/gpu/drm/xe/xe_gt_sriov_pf_migration.c
··· 1026 1026 1027 1027 static void pf_gt_migration_check_support(struct xe_gt *gt) 1028 1028 { 1029 - if (GUC_FIRMWARE_VER(&gt->uc.guc) < MAKE_GUC_VER(70, 54, 0)) 1029 + if (!GUC_FIRMWARE_VER_AT_LEAST(&gt->uc.guc, 70, 54)) 1030 1030 xe_sriov_pf_migration_disable(gt_to_xe(gt), "requires GuC version >= 70.54.0"); 1031 1031 } 1032 1032
+126 -44
drivers/gpu/drm/xe/xe_gt_sriov_vf.c
··· 5 5 6 6 #include <linux/bitfield.h> 7 7 #include <linux/bsearch.h> 8 + #include <linux/delay.h> 8 9 9 10 #include <drm/drm_managed.h> 10 11 #include <drm/drm_print.h> ··· 41 40 #include "xe_wopcm.h" 42 41 43 42 #define make_u64_from_u32(hi, lo) ((u64)((u64)(u32)(hi) << 32 | (u32)(lo))) 43 + 44 + #ifdef CONFIG_DRM_XE_DEBUG 45 + enum VF_MIGRATION_WAIT_POINTS { 46 + VF_MIGRATION_WAIT_RESFIX_START = BIT(0), 47 + VF_MIGRATION_WAIT_FIXUPS = BIT(1), 48 + VF_MIGRATION_WAIT_RESTART_JOBS = BIT(2), 49 + VF_MIGRATION_WAIT_RESFIX_DONE = BIT(3), 50 + }; 51 + 52 + #define VF_MIGRATION_WAIT_DELAY_IN_MS 1000 53 + static void vf_post_migration_inject_wait(struct xe_gt *gt, 54 + enum VF_MIGRATION_WAIT_POINTS wait) 55 + { 56 + while (gt->sriov.vf.migration.debug.resfix_stoppers & wait) { 57 + xe_gt_dbg(gt, 58 + "*TESTING* injecting %u ms delay due to resfix_stoppers=%#x, to continue clear %#x\n", 59 + VF_MIGRATION_WAIT_DELAY_IN_MS, 60 + gt->sriov.vf.migration.debug.resfix_stoppers, wait); 61 + 62 + msleep(VF_MIGRATION_WAIT_DELAY_IN_MS); 63 + } 64 + } 65 + 66 + #define VF_MIGRATION_INJECT_WAIT(gt, _POS) ({ \ 67 + struct xe_gt *__gt = (gt); \ 68 + vf_post_migration_inject_wait(__gt, VF_MIGRATION_WAIT_##_POS); \ 69 + }) 70 + 71 + #else 72 + #define VF_MIGRATION_INJECT_WAIT(_gt, ...) typecheck(struct xe_gt *, (_gt)) 73 + #endif 44 74 45 75 static int guc_action_vf_reset(struct xe_guc *guc) 46 76 { ··· 331 299 *found = gt->sriov.vf.guc_version; 332 300 } 333 301 334 - static int guc_action_vf_notify_resfix_done(struct xe_guc *guc) 302 + static int guc_action_vf_resfix_start(struct xe_guc *guc, u16 marker) 335 303 { 336 304 u32 request[GUC_HXG_REQUEST_MSG_MIN_LEN] = { 337 305 FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) | 338 306 FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | 339 - FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_ACTION_VF2GUC_NOTIFY_RESFIX_DONE), 307 + FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_ACTION_VF2GUC_RESFIX_START) | 308 + FIELD_PREP(VF2GUC_RESFIX_START_REQUEST_MSG_0_MARKER, marker), 340 309 }; 341 310 int ret; 342 311 ··· 346 313 return ret > 0 ? -EPROTO : ret; 347 314 } 348 315 349 - /** 350 - * vf_notify_resfix_done - Notify GuC about resource fixups apply completed. 351 - * @gt: the &xe_gt struct instance linked to target GuC 352 - * 353 - * Returns: 0 if the operation completed successfully, or a negative error 354 - * code otherwise. 355 - */ 356 - static int vf_notify_resfix_done(struct xe_gt *gt) 316 + static int vf_resfix_start(struct xe_gt *gt, u16 marker) 357 317 { 358 318 struct xe_guc *guc = &gt->uc.guc; 359 - int err; 360 319 361 320 xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); 362 321 363 - err = guc_action_vf_notify_resfix_done(guc); 364 - if (unlikely(err)) 365 - xe_gt_sriov_err(gt, "Failed to notify GuC about resource fixup done (%pe)\n", 366 - ERR_PTR(err)); 367 - else 368 - xe_gt_sriov_dbg_verbose(gt, "sent GuC resource fixup done\n"); 322 + VF_MIGRATION_INJECT_WAIT(gt, RESFIX_START); 369 323 370 - return err; 324 + xe_gt_sriov_dbg_verbose(gt, "Sending resfix start marker %u\n", marker); 325 + 326 + return guc_action_vf_resfix_start(guc, marker); 327 + } 328 + 329 + static int guc_action_vf_resfix_done(struct xe_guc *guc, u16 marker) 330 + { 331 + u32 request[GUC_HXG_REQUEST_MSG_MIN_LEN] = { 332 + FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_HOST) | 333 + FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | 334 + FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION, GUC_ACTION_VF2GUC_RESFIX_DONE) | 335 + FIELD_PREP(VF2GUC_RESFIX_DONE_REQUEST_MSG_0_MARKER, marker), 336 + }; 337 + int ret; 338 + 339 + ret = xe_guc_mmio_send(guc, request, ARRAY_SIZE(request)); 340 + 341 + return ret > 0 ? -EPROTO : ret; 342 + } 343 + 344 + static int vf_resfix_done(struct xe_gt *gt, u16 marker) 345 + { 346 + struct xe_guc *guc = &gt->uc.guc; 347 + 348 + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); 349 + 350 + xe_gt_sriov_dbg_verbose(gt, "Sending resfix done marker %u\n", marker); 351 + 352 + return guc_action_vf_resfix_done(guc, marker); 371 353 } 372 354 373 355 static int guc_action_query_single_klv(struct xe_guc *guc, u32 key, ··· 1171 1123 return true; 1172 1124 } 1173 1125 1174 - spin_lock_irq(&gt->sriov.vf.migration.lock); 1175 - gt->sriov.vf.migration.recovery_queued = false; 1176 - spin_unlock_irq(&gt->sriov.vf.migration.lock); 1177 - 1178 1126 xe_guc_ct_flush_and_stop(&gt->uc.guc.ct); 1179 - xe_guc_submit_pause(&gt->uc.guc); 1127 + xe_guc_submit_pause_vf(&gt->uc.guc); 1180 1128 xe_tlb_inval_reset(&gt->tlb_inval); 1181 1129 1182 1130 return false; ··· 1187 1143 { 1188 1144 void *buf = gt->sriov.vf.migration.scratch; 1189 1145 int err; 1146 + 1147 + VF_MIGRATION_INJECT_WAIT(gt, FIXUPS); 1190 1148 1191 1149 /* xe_gt_sriov_vf_query_config will fixup the GGTT addresses */ 1192 1150 err = xe_gt_sriov_vf_query_config(gt); ··· 1208 1162 1209 1163 static void vf_post_migration_rearm(struct xe_gt *gt) 1210 1164 { 1165 + VF_MIGRATION_INJECT_WAIT(gt, RESTART_JOBS); 1166 + 1167 + /* 1168 + * Make sure interrupts on the new HW are properly set. The GuC IRQ 1169 + * must be working at this point, since the recovery did started, 1170 + * but the rest was not enabled using the procedure from spec. 1171 + */ 1172 + xe_irq_resume(gt_to_xe(gt)); 1173 + 1211 1174 xe_guc_ct_restart(&gt->uc.guc.ct); 1212 - xe_guc_submit_unpause_prepare(&gt->uc.guc); 1175 + xe_guc_submit_unpause_prepare_vf(&gt->uc.guc); 1213 1176 } 1214 1177 1215 1178 static void vf_post_migration_kickstart(struct xe_gt *gt) 1216 1179 { 1217 - xe_guc_submit_unpause(&gt->uc.guc); 1180 + xe_guc_submit_unpause_vf(&gt->uc.guc); 1218 1181 } 1219 1182 1220 1183 static void vf_post_migration_abort(struct xe_gt *gt) ··· 1238 1183 xe_guc_submit_pause_abort(&gt->uc.guc); 1239 1184 } 1240 1185 1241 - static int vf_post_migration_notify_resfix_done(struct xe_gt *gt) 1186 + static int vf_post_migration_resfix_done(struct xe_gt *gt, u16 marker) 1242 1187 { 1243 - bool skip_resfix = false; 1188 + VF_MIGRATION_INJECT_WAIT(gt, RESFIX_DONE); 1244 1189 1245 1190 spin_lock_irq(&gt->sriov.vf.migration.lock); 1246 - if (gt->sriov.vf.migration.recovery_queued) { 1247 - skip_resfix = true; 1248 - xe_gt_sriov_dbg(gt, "another recovery imminent, resfix skipped\n"); 1249 - } else { 1191 + if (gt->sriov.vf.migration.recovery_queued) 1192 + xe_gt_sriov_dbg(gt, "another recovery imminent\n"); 1193 + else 1250 1194 WRITE_ONCE(gt->sriov.vf.migration.recovery_inprogress, false); 1251 - } 1252 1195 spin_unlock_irq(&gt->sriov.vf.migration.lock); 1253 1196 1254 - if (skip_resfix) 1255 - return -EAGAIN; 1197 + return vf_resfix_done(gt, marker); 1198 + } 1256 1199 1257 - /* 1258 - * Make sure interrupts on the new HW are properly set. The GuC IRQ 1259 - * must be working at this point, since the recovery did started, 1260 - * but the rest was not enabled using the procedure from spec. 1261 - */ 1262 - xe_irq_resume(gt_to_xe(gt)); 1200 + static int vf_post_migration_resfix_start(struct xe_gt *gt, u16 marker) 1201 + { 1202 + int err; 1263 1203 1264 - return vf_notify_resfix_done(gt); 1204 + err = vf_resfix_start(gt, marker); 1205 + 1206 + guard(spinlock_irq) (&gt->sriov.vf.migration.lock); 1207 + gt->sriov.vf.migration.recovery_queued = false; 1208 + 1209 + return err; 1210 + } 1211 + 1212 + static u16 vf_post_migration_next_resfix_marker(struct xe_gt *gt) 1213 + { 1214 + xe_gt_assert(gt, IS_SRIOV_VF(gt_to_xe(gt))); 1215 + 1216 + BUILD_BUG_ON(1 + ((typeof(gt->sriov.vf.migration.resfix_marker))~0) > 1217 + FIELD_MAX(VF2GUC_RESFIX_START_REQUEST_MSG_0_MARKER)); 1218 + 1219 + /* add 1 to avoid zero-marker */ 1220 + return 1 + gt->sriov.vf.migration.resfix_marker++; 1265 1221 } 1266 1222 1267 1223 static void vf_post_migration_recovery(struct xe_gt *gt) 1268 1224 { 1269 1225 struct xe_device *xe = gt_to_xe(gt); 1270 - int err; 1226 + u16 marker; 1271 1227 bool retry; 1228 + int err; 1272 1229 1273 1230 xe_gt_sriov_dbg(gt, "migration recovery in progress\n"); 1274 1231 ··· 1294 1227 goto fail; 1295 1228 } 1296 1229 1230 + marker = vf_post_migration_next_resfix_marker(gt); 1231 + 1232 + err = vf_post_migration_resfix_start(gt, marker); 1233 + if (unlikely(err)) { 1234 + xe_gt_sriov_err(gt, "Recovery failed at GuC RESFIX_START step (%pe)\n", 1235 + ERR_PTR(err)); 1236 + goto fail; 1237 + } 1238 + 1297 1239 err = vf_post_migration_fixups(gt); 1298 1240 if (err) 1299 1241 goto fail; 1300 1242 1301 1243 vf_post_migration_rearm(gt); 1302 1244 1303 - err = vf_post_migration_notify_resfix_done(gt); 1304 - if (err && err != -EAGAIN) 1245 + err = vf_post_migration_resfix_done(gt, marker); 1246 + if (err) { 1247 + if (err == -EREMCHG) 1248 + goto queue; 1249 + 1250 + xe_gt_sriov_err(gt, "Recovery failed at GuC RESFIX_DONE step (%pe)\n", 1251 + ERR_PTR(err)); 1305 1252 goto fail; 1253 + } 1306 1254 1307 1255 vf_post_migration_kickstart(gt); 1308 1256
+12
drivers/gpu/drm/xe/xe_gt_sriov_vf_debugfs.c
··· 69 69 vfdentry->d_inode->i_private = gt; 70 70 71 71 drm_debugfs_create_files(vf_info, ARRAY_SIZE(vf_info), vfdentry, minor); 72 + 73 + /* 74 + * /sys/kernel/debug/dri/BDF/ 75 + * ├── tile0 76 + * ├── gt0 77 + * ├── vf 78 + * ├── resfix_stoppers 79 + */ 80 + if (IS_ENABLED(CONFIG_DRM_XE_DEBUG)) { 81 + debugfs_create_x8("resfix_stoppers", 0600, vfdentry, 82 + &gt->sriov.vf.migration.debug.resfix_stoppers); 83 + } 72 84 }
+13
drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
··· 52 52 wait_queue_head_t wq; 53 53 /** @scratch: Scratch memory for VF recovery */ 54 54 void *scratch; 55 + /** @debug: Debug hooks for delaying migration */ 56 + struct { 57 + /** 58 + * @debug.resfix_stoppers: Stop and wait at different stages 59 + * during post migration recovery 60 + */ 61 + u8 resfix_stoppers; 62 + } debug; 63 + /** 64 + * @resfix_marker: Marker sent on start and on end of post-migration 65 + * steps. 66 + */ 67 + u8 resfix_marker; 55 68 /** @recovery_teardown: VF post migration recovery is being torn down */ 56 69 bool recovery_teardown; 57 70 /** @recovery_queued: VF post migration recovery in queued */
+10
drivers/gpu/drm/xe/xe_gt_stats.c
··· 66 66 DEF_STAT_STR(SVM_4K_BIND_US, "svm_4K_bind_us"), 67 67 DEF_STAT_STR(SVM_64K_BIND_US, "svm_64K_bind_us"), 68 68 DEF_STAT_STR(SVM_2M_BIND_US, "svm_2M_bind_us"), 69 + DEF_STAT_STR(HW_ENGINE_GROUP_SUSPEND_LR_QUEUE_COUNT, 70 + "hw_engine_group_suspend_lr_queue_count"), 71 + DEF_STAT_STR(HW_ENGINE_GROUP_SKIP_LR_QUEUE_COUNT, 72 + "hw_engine_group_skip_lr_queue_count"), 73 + DEF_STAT_STR(HW_ENGINE_GROUP_WAIT_DMA_QUEUE_COUNT, 74 + "hw_engine_group_wait_dma_queue_count"), 75 + DEF_STAT_STR(HW_ENGINE_GROUP_SUSPEND_LR_QUEUE_US, 76 + "hw_engine_group_suspend_lr_queue_us"), 77 + DEF_STAT_STR(HW_ENGINE_GROUP_WAIT_DMA_QUEUE_US, 78 + "hw_engine_group_wait_dma_queue_us"), 69 79 }; 70 80 71 81 /**
+32
drivers/gpu/drm/xe/xe_gt_stats.h
··· 6 6 #ifndef _XE_GT_STATS_H_ 7 7 #define _XE_GT_STATS_H_ 8 8 9 + #include <linux/ktime.h> 10 + 9 11 #include "xe_gt_stats_types.h" 10 12 11 13 struct xe_gt; ··· 25 23 } 26 24 27 25 #endif 26 + 27 + /** 28 + * xe_gt_stats_ktime_us_delta() - Get delta in microseconds between now and a 29 + * start time 30 + * @start: Start time 31 + * 32 + * Helper for GT stats to get delta in microseconds between now and a start 33 + * time, compiles out if GT stats are disabled. 34 + * 35 + * Return: Delta in microseconds between now and a start time 36 + */ 37 + static inline s64 xe_gt_stats_ktime_us_delta(ktime_t start) 38 + { 39 + return IS_ENABLED(CONFIG_DEBUG_FS) ? 40 + ktime_us_delta(ktime_get(), start) : 0; 41 + } 42 + 43 + /** 44 + * xe_gt_stats_ktime_get() - Get current ktime 45 + * 46 + * Helper for GT stats to get current ktime, compiles out if GT stats are 47 + * disabled. 48 + * 49 + * Return: Get current ktime 50 + */ 51 + static inline ktime_t xe_gt_stats_ktime_get(void) 52 + { 53 + return IS_ENABLED(CONFIG_DEBUG_FS) ? ktime_get() : 0; 54 + } 55 + 28 56 #endif
+5
drivers/gpu/drm/xe/xe_gt_stats_types.h
··· 44 44 XE_GT_STATS_ID_SVM_4K_BIND_US, 45 45 XE_GT_STATS_ID_SVM_64K_BIND_US, 46 46 XE_GT_STATS_ID_SVM_2M_BIND_US, 47 + XE_GT_STATS_ID_HW_ENGINE_GROUP_SUSPEND_LR_QUEUE_COUNT, 48 + XE_GT_STATS_ID_HW_ENGINE_GROUP_SKIP_LR_QUEUE_COUNT, 49 + XE_GT_STATS_ID_HW_ENGINE_GROUP_WAIT_DMA_QUEUE_COUNT, 50 + XE_GT_STATS_ID_HW_ENGINE_GROUP_SUSPEND_LR_QUEUE_US, 51 + XE_GT_STATS_ID_HW_ENGINE_GROUP_WAIT_DMA_QUEUE_US, 47 52 /* must be the last entry */ 48 53 __XE_GT_STATS_NUM_IDS, 49 54 };
+3 -6
drivers/gpu/drm/xe/xe_gt_throttle.c
··· 85 85 { 86 86 struct xe_device *xe = gt_to_xe(gt); 87 87 struct xe_reg reg; 88 - u32 val, mask; 88 + u32 mask; 89 89 90 90 if (xe_gt_is_media_type(gt)) 91 91 reg = MTL_MEDIA_PERF_LIMIT_REASONS; ··· 97 97 else 98 98 mask = GT0_PERF_LIMIT_REASONS_MASK; 99 99 100 - xe_pm_runtime_get(xe); 101 - val = xe_mmio_read32(&gt->mmio, reg) & mask; 102 - xe_pm_runtime_put(xe); 103 - 104 - return val; 100 + guard(xe_pm_runtime)(xe); 101 + return xe_mmio_read32(&gt->mmio, reg) & mask; 105 102 } 106 103 107 104 static bool is_throttled_by(struct xe_gt *gt, u32 mask)
+5
drivers/gpu/drm/xe/xe_gt_types.h
··· 140 140 u64 engine_mask; 141 141 /** @info.gmdid: raw GMD_ID value from hardware */ 142 142 u32 gmdid; 143 + /** 144 + * @multi_queue_engine_class_mask: Bitmask of engine classes with 145 + * multi queue support enabled. 146 + */ 147 + u16 multi_queue_engine_class_mask; 143 148 /** @info.id: Unique ID of this GT within the PCI Device */ 144 149 u8 id; 145 150 /** @info.has_indirect_ring_state: GT has indirect ring state support */
+60 -20
drivers/gpu/drm/xe/xe_guc.c
··· 104 104 u32 offset = guc_bo_ggtt_addr(guc, guc->log.bo) >> PAGE_SHIFT; 105 105 u32 flags; 106 106 107 - #if (((CRASH_BUFFER_SIZE) % SZ_1M) == 0) 107 + #if (((XE_GUC_LOG_CRASH_DUMP_BUFFER_SIZE) % SZ_1M) == 0) 108 108 #define LOG_UNIT SZ_1M 109 109 #define LOG_FLAG GUC_LOG_LOG_ALLOC_UNITS 110 110 #else ··· 112 112 #define LOG_FLAG 0 113 113 #endif 114 114 115 - #if (((CAPTURE_BUFFER_SIZE) % SZ_1M) == 0) 115 + #if (((XE_GUC_LOG_STATE_CAPTURE_BUFFER_SIZE) % SZ_1M) == 0) 116 116 #define CAPTURE_UNIT SZ_1M 117 117 #define CAPTURE_FLAG GUC_LOG_CAPTURE_ALLOC_UNITS 118 118 #else ··· 120 120 #define CAPTURE_FLAG 0 121 121 #endif 122 122 123 - BUILD_BUG_ON(!CRASH_BUFFER_SIZE); 124 - BUILD_BUG_ON(!IS_ALIGNED(CRASH_BUFFER_SIZE, LOG_UNIT)); 125 - BUILD_BUG_ON(!DEBUG_BUFFER_SIZE); 126 - BUILD_BUG_ON(!IS_ALIGNED(DEBUG_BUFFER_SIZE, LOG_UNIT)); 127 - BUILD_BUG_ON(!CAPTURE_BUFFER_SIZE); 128 - BUILD_BUG_ON(!IS_ALIGNED(CAPTURE_BUFFER_SIZE, CAPTURE_UNIT)); 123 + BUILD_BUG_ON(!XE_GUC_LOG_CRASH_DUMP_BUFFER_SIZE); 124 + BUILD_BUG_ON(!IS_ALIGNED(XE_GUC_LOG_CRASH_DUMP_BUFFER_SIZE, LOG_UNIT)); 125 + BUILD_BUG_ON(!XE_GUC_LOG_EVENT_DATA_BUFFER_SIZE); 126 + BUILD_BUG_ON(!IS_ALIGNED(XE_GUC_LOG_EVENT_DATA_BUFFER_SIZE, LOG_UNIT)); 127 + BUILD_BUG_ON(!XE_GUC_LOG_STATE_CAPTURE_BUFFER_SIZE); 128 + BUILD_BUG_ON(!IS_ALIGNED(XE_GUC_LOG_STATE_CAPTURE_BUFFER_SIZE, CAPTURE_UNIT)); 129 129 130 130 flags = GUC_LOG_VALID | 131 131 GUC_LOG_NOTIFY_ON_HALF_FULL | 132 132 CAPTURE_FLAG | 133 133 LOG_FLAG | 134 - FIELD_PREP(GUC_LOG_CRASH, CRASH_BUFFER_SIZE / LOG_UNIT - 1) | 135 - FIELD_PREP(GUC_LOG_DEBUG, DEBUG_BUFFER_SIZE / LOG_UNIT - 1) | 136 - FIELD_PREP(GUC_LOG_CAPTURE, CAPTURE_BUFFER_SIZE / CAPTURE_UNIT - 1) | 134 + FIELD_PREP(GUC_LOG_CRASH_DUMP, XE_GUC_LOG_CRASH_DUMP_BUFFER_SIZE / LOG_UNIT - 1) | 135 + FIELD_PREP(GUC_LOG_EVENT_DATA, XE_GUC_LOG_EVENT_DATA_BUFFER_SIZE / LOG_UNIT - 1) | 136 + FIELD_PREP(GUC_LOG_STATE_CAPTURE, XE_GUC_LOG_STATE_CAPTURE_BUFFER_SIZE / 137 + CAPTURE_UNIT - 1) | 137 138 FIELD_PREP(GUC_LOG_BUF_ADDR, offset); 138 139 139 140 #undef LOG_UNIT ··· 661 660 { 662 661 struct xe_guc *guc = arg; 663 662 struct xe_gt *gt = guc_to_gt(guc); 664 - unsigned int fw_ref; 665 663 666 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); 667 - xe_uc_sanitize_reset(&guc_to_gt(guc)->uc); 668 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 664 + xe_with_force_wake(fw_ref, gt_to_fw(gt), XE_FORCEWAKE_ALL) 665 + xe_uc_sanitize_reset(&guc_to_gt(guc)->uc); 669 666 670 667 guc_g2g_fini(guc); 671 668 } ··· 766 767 767 768 if (!xe_uc_fw_is_enabled(&guc->fw)) 768 769 return 0; 770 + 771 + /* Disable page reclaim if GuC FW does not support */ 772 + if (GUC_SUBMIT_VER(guc) < MAKE_GUC_VER(1, 14, 0)) 773 + xe->info.has_page_reclaim_hw_assist = false; 769 774 770 775 if (IS_SRIOV_VF(xe)) { 771 776 ret = xe_guc_ct_init(&guc->ct); ··· 1488 1485 u32 hint = FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, header); 1489 1486 u32 error = FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, header); 1490 1487 1488 + if (unlikely(error == XE_GUC_RESPONSE_VF_MIGRATED)) { 1489 + xe_gt_dbg(gt, "GuC mmio request %#x rejected due to MIGRATION (hint %#x)\n", 1490 + request[0], hint); 1491 + return -EREMCHG; 1492 + } 1493 + 1491 1494 xe_gt_err(gt, "GuC mmio request %#x: failure %#x hint %#x\n", 1492 1495 request[0], error, hint); 1493 1496 return -ENXIO; ··· 1627 1618 return xe_guc_submit_start(guc); 1628 1619 } 1629 1620 1621 + /** 1622 + * xe_guc_runtime_suspend() - GuC runtime suspend 1623 + * @guc: The GuC object 1624 + * 1625 + * Stop further runs of submission tasks on given GuC and runtime suspend 1626 + * GuC CT. 1627 + */ 1628 + void xe_guc_runtime_suspend(struct xe_guc *guc) 1629 + { 1630 + xe_guc_submit_pause(guc); 1631 + xe_guc_submit_disable(guc); 1632 + xe_guc_ct_runtime_suspend(&guc->ct); 1633 + } 1634 + 1635 + /** 1636 + * xe_guc_runtime_resume() - GuC runtime resume 1637 + * @guc: The GuC object 1638 + * 1639 + * Runtime resume GuC CT and allow further runs of submission tasks on 1640 + * given GuC. 1641 + */ 1642 + void xe_guc_runtime_resume(struct xe_guc *guc) 1643 + { 1644 + /* 1645 + * Runtime PM flows are not applicable for VFs, so it's safe to 1646 + * directly enable IRQ. 1647 + */ 1648 + guc_enable_irq(guc); 1649 + 1650 + xe_guc_ct_runtime_resume(&guc->ct); 1651 + xe_guc_submit_enable(guc); 1652 + xe_guc_submit_unpause(guc); 1653 + } 1654 + 1630 1655 void xe_guc_print_info(struct xe_guc *guc, struct drm_printer *p) 1631 1656 { 1632 1657 struct xe_gt *gt = guc_to_gt(guc); 1633 - unsigned int fw_ref; 1634 1658 u32 status; 1635 1659 int i; 1636 1660 1637 1661 xe_uc_fw_print(&guc->fw, p); 1638 1662 1639 1663 if (!IS_SRIOV_VF(gt_to_xe(gt))) { 1640 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 1641 - if (!fw_ref) 1664 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 1665 + if (!fw_ref.domains) 1642 1666 return; 1643 1667 1644 1668 status = xe_mmio_read32(&gt->mmio, GUC_STATUS); ··· 1691 1649 drm_printf(p, "\t%2d: \t0x%x\n", 1692 1650 i, xe_mmio_read32(&gt->mmio, SOFT_SCRATCH(i))); 1693 1651 } 1694 - 1695 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 1696 1652 } 1697 1653 1698 1654 drm_puts(p, "\n");
+23
drivers/gpu/drm/xe/xe_guc.h
··· 18 18 */ 19 19 #define MAKE_GUC_VER(maj, min, pat) (((maj) << 16) | ((min) << 8) | (pat)) 20 20 #define MAKE_GUC_VER_STRUCT(ver) MAKE_GUC_VER((ver).major, (ver).minor, (ver).patch) 21 + #define MAKE_GUC_VER_ARGS(ver...) \ 22 + (BUILD_BUG_ON_ZERO(COUNT_ARGS(ver) < 2 || COUNT_ARGS(ver) > 3) + \ 23 + MAKE_GUC_VER(PICK_ARG1(ver), PICK_ARG2(ver), IF_ARGS(PICK_ARG3(ver), 0, PICK_ARG3(ver)))) 24 + 21 25 #define GUC_SUBMIT_VER(guc) \ 22 26 MAKE_GUC_VER_STRUCT((guc)->fw.versions.found[XE_UC_FW_VER_COMPATIBILITY]) 23 27 #define GUC_FIRMWARE_VER(guc) \ 24 28 MAKE_GUC_VER_STRUCT((guc)->fw.versions.found[XE_UC_FW_VER_RELEASE]) 29 + #define GUC_FIRMWARE_VER_AT_LEAST(guc, ver...) \ 30 + xe_guc_fw_version_at_least((guc), MAKE_GUC_VER_ARGS(ver)) 25 31 26 32 struct drm_printer; 27 33 ··· 41 35 int xe_guc_min_load_for_hwconfig(struct xe_guc *guc); 42 36 int xe_guc_enable_communication(struct xe_guc *guc); 43 37 int xe_guc_opt_in_features_enable(struct xe_guc *guc); 38 + void xe_guc_runtime_suspend(struct xe_guc *guc); 39 + void xe_guc_runtime_resume(struct xe_guc *guc); 44 40 int xe_guc_suspend(struct xe_guc *guc); 45 41 void xe_guc_notify(struct xe_guc *guc); 46 42 int xe_guc_auth_huc(struct xe_guc *guc, u32 rsa_addr); ··· 100 92 static inline struct drm_device *guc_to_drm(struct xe_guc *guc) 101 93 { 102 94 return &guc_to_xe(guc)->drm; 95 + } 96 + 97 + /** 98 + * xe_guc_fw_version_at_least() - Check if GuC is at least of given version. 99 + * @guc: the &xe_guc 100 + * @ver: the version to check 101 + * 102 + * The @ver should be prepared using MAKE_GUC_VER(major, minor, patch). 103 + * 104 + * Return: true if loaded GuC firmware is at least of given version, 105 + * false otherwise. 106 + */ 107 + static inline bool xe_guc_fw_version_at_least(const struct xe_guc *guc, u32 ver) 108 + { 109 + return GUC_FIRMWARE_VER(guc) >= ver; 103 110 } 104 111 105 112 #endif
+3 -3
drivers/gpu/drm/xe/xe_guc_ads.c
··· 317 317 offset = guc_ads_waklv_offset(ads); 318 318 remain = guc_ads_waklv_size(ads); 319 319 320 - if (XE_GT_WA(gt, 14019882105) || XE_GT_WA(gt, 16021333562)) 320 + if (XE_GT_WA(gt, 16021333562)) 321 321 guc_waklv_enable(ads, NULL, 0, &offset, &remain, 322 322 GUC_WORKAROUND_KLV_BLOCK_INTERRUPTS_WHEN_MGSR_BLOCKED); 323 323 if (XE_GT_WA(gt, 18024947630)) ··· 347 347 guc_waklv_enable(ads, NULL, 0, &offset, &remain, 348 348 GUC_WORKAROUND_KLV_ID_BACK_TO_BACK_RCS_ENGINE_RESET); 349 349 350 - if (GUC_FIRMWARE_VER(&gt->uc.guc) >= MAKE_GUC_VER(70, 44, 0) && XE_GT_WA(gt, 16026508708)) 350 + if (GUC_FIRMWARE_VER_AT_LEAST(&gt->uc.guc, 70, 44) && XE_GT_WA(gt, 16026508708)) 351 351 guc_waklv_enable(ads, NULL, 0, &offset, &remain, 352 352 GUC_WA_KLV_RESET_BB_STACK_PTR_ON_VF_SWITCH); 353 - if (GUC_FIRMWARE_VER(&gt->uc.guc) >= MAKE_GUC_VER(70, 47, 0) && XE_GT_WA(gt, 16026007364)) { 353 + if (GUC_FIRMWARE_VER_AT_LEAST(&gt->uc.guc, 70, 47) && XE_GT_WA(gt, 16026007364)) { 354 354 u32 data[] = { 355 355 0x0, 356 356 0xF,
+1 -1
drivers/gpu/drm/xe/xe_guc_buf.c
··· 30 30 struct xe_gt *gt = cache_to_gt(cache); 31 31 struct xe_sa_manager *sam; 32 32 33 - sam = __xe_sa_bo_manager_init(gt_to_tile(gt), size, 0, sizeof(u32)); 33 + sam = __xe_sa_bo_manager_init(gt_to_tile(gt), size, 0, sizeof(u32), 0); 34 34 if (IS_ERR(sam)) 35 35 return PTR_ERR(sam); 36 36 cache->sam = sam;
+8 -8
drivers/gpu/drm/xe/xe_guc_capture.c
··· 843 843 { 844 844 int capture_size = guc_capture_output_size_est(guc); 845 845 int spare_size = capture_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER; 846 - u32 buffer_size = xe_guc_log_section_size_capture(&guc->log); 846 + u32 buffer_size = XE_GUC_LOG_STATE_CAPTURE_BUFFER_SIZE; 847 847 848 848 /* 849 849 * NOTE: capture_size is much smaller than the capture region ··· 949 949 * ADS module also calls separately for PF vs VF. 950 950 * 951 951 * --> alloc B: GuC output capture buf (registered via guc_init_params(log_param)) 952 - * Size = #define CAPTURE_BUFFER_SIZE (warns if on too-small) 952 + * Size = XE_GUC_LOG_STATE_CAPTURE_BUFFER_SIZE (warns if on too-small) 953 953 * Note2: 'x 3' to hold multiple capture groups 954 954 * 955 955 * GUC Runtime notify capture: ··· 1367 1367 { 1368 1368 u32 action[] = { 1369 1369 XE_GUC_ACTION_LOG_BUFFER_FILE_FLUSH_COMPLETE, 1370 - GUC_LOG_BUFFER_CAPTURE 1370 + GUC_LOG_TYPE_STATE_CAPTURE 1371 1371 }; 1372 1372 1373 1373 return xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action)); ··· 1384 1384 u32 log_buf_state_offset; 1385 1385 u32 src_data_offset; 1386 1386 1387 - log_buf_state_offset = sizeof(struct guc_log_buffer_state) * GUC_LOG_BUFFER_CAPTURE; 1388 - src_data_offset = xe_guc_get_log_buffer_offset(&guc->log, GUC_LOG_BUFFER_CAPTURE); 1387 + log_buf_state_offset = sizeof(struct guc_log_buffer_state) * GUC_LOG_TYPE_STATE_CAPTURE; 1388 + src_data_offset = XE_GUC_LOG_STATE_CAPTURE_OFFSET; 1389 1389 1390 1390 /* 1391 1391 * Make a copy of the state structure, inside GuC log buffer ··· 1395 1395 xe_map_memcpy_from(guc_to_xe(guc), &log_buf_state_local, &guc->log.bo->vmap, 1396 1396 log_buf_state_offset, sizeof(struct guc_log_buffer_state)); 1397 1397 1398 - buffer_size = xe_guc_get_log_buffer_size(&guc->log, GUC_LOG_BUFFER_CAPTURE); 1398 + buffer_size = XE_GUC_LOG_STATE_CAPTURE_BUFFER_SIZE; 1399 1399 read_offset = log_buf_state_local.read_ptr; 1400 1400 write_offset = log_buf_state_local.sampled_write_ptr; 1401 1401 full_count = FIELD_GET(GUC_LOG_BUFFER_STATE_BUFFER_FULL_CNT, log_buf_state_local.flags); 1402 1402 1403 1403 /* Bookkeeping stuff */ 1404 1404 tmp = FIELD_GET(GUC_LOG_BUFFER_STATE_FLUSH_TO_FILE, log_buf_state_local.flags); 1405 - guc->log.stats[GUC_LOG_BUFFER_CAPTURE].flush += tmp; 1406 - new_overflow = xe_guc_check_log_buf_overflow(&guc->log, GUC_LOG_BUFFER_CAPTURE, 1405 + guc->log.stats[GUC_LOG_TYPE_STATE_CAPTURE].flush += tmp; 1406 + new_overflow = xe_guc_check_log_buf_overflow(&guc->log, GUC_LOG_TYPE_STATE_CAPTURE, 1407 1407 full_count); 1408 1408 1409 1409 /* Now copy the actual logs. */
+170 -103
drivers/gpu/drm/xe/xe_guc_ct.c
··· 42 42 static void guc_ct_change_state(struct xe_guc_ct *ct, 43 43 enum xe_guc_ct_state state); 44 44 45 + static struct xe_guc *ct_to_guc(struct xe_guc_ct *ct) 46 + { 47 + return container_of(ct, struct xe_guc, ct); 48 + } 49 + 50 + static struct xe_gt *ct_to_gt(struct xe_guc_ct *ct) 51 + { 52 + return container_of(ct, struct xe_gt, uc.guc.ct); 53 + } 54 + 55 + static struct xe_device *ct_to_xe(struct xe_guc_ct *ct) 56 + { 57 + return gt_to_xe(ct_to_gt(ct)); 58 + } 59 + 45 60 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) 46 61 enum { 47 62 /* Internal states, not error conditions */ ··· 83 68 static void ct_dead_worker_func(struct work_struct *w); 84 69 static void ct_dead_capture(struct xe_guc_ct *ct, struct guc_ctb *ctb, u32 reason_code); 85 70 86 - #define CT_DEAD(ct, ctb, reason_code) ct_dead_capture((ct), (ctb), CT_DEAD_##reason_code) 71 + static void ct_dead_fini(struct xe_guc_ct *ct) 72 + { 73 + cancel_work_sync(&ct->dead.worker); 74 + } 75 + 76 + static void ct_dead_init(struct xe_guc_ct *ct) 77 + { 78 + spin_lock_init(&ct->dead.lock); 79 + INIT_WORK(&ct->dead.worker, ct_dead_worker_func); 80 + 81 + #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) 82 + stack_depot_init(); 83 + #endif 84 + } 85 + 86 + static void fast_req_stack_save(struct xe_guc_ct *ct, unsigned int slot) 87 + { 88 + #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) 89 + unsigned long entries[SZ_32]; 90 + unsigned int n; 91 + 92 + n = stack_trace_save(entries, ARRAY_SIZE(entries), 1); 93 + /* May be called under spinlock, so avoid sleeping */ 94 + ct->fast_req[slot].stack = stack_depot_save(entries, n, GFP_NOWAIT); 95 + #endif 96 + } 97 + 98 + static void fast_req_dump(struct xe_guc_ct *ct, u16 fence, unsigned int slot) 99 + { 100 + struct xe_gt *gt = ct_to_gt(ct); 101 + #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) 102 + char *buf __cleanup(kfree) = kmalloc(SZ_4K, GFP_NOWAIT); 103 + 104 + if (buf && stack_depot_snprint(ct->fast_req[slot].stack, buf, SZ_4K, 0)) 105 + xe_gt_err(gt, "Fence 0x%x was used by action %#04x sent at:\n%s\n", 106 + fence, ct->fast_req[slot].action, buf); 107 + else 108 + xe_gt_err(gt, "Fence 0x%x was used by action %#04x [failed to retrieve stack]\n", 109 + fence, ct->fast_req[slot].action); 87 110 #else 111 + xe_gt_err(gt, "Fence 0x%x was used by action %#04x\n", 112 + fence, ct->fast_req[slot].action); 113 + #endif 114 + } 115 + 116 + static void fast_req_report(struct xe_guc_ct *ct, u16 fence) 117 + { 118 + u16 fence_min = U16_MAX, fence_max = 0; 119 + struct xe_gt *gt = ct_to_gt(ct); 120 + unsigned int n; 121 + 122 + lockdep_assert_held(&ct->lock); 123 + 124 + for (n = 0; n < ARRAY_SIZE(ct->fast_req); n++) { 125 + if (ct->fast_req[n].fence < fence_min) 126 + fence_min = ct->fast_req[n].fence; 127 + if (ct->fast_req[n].fence > fence_max) 128 + fence_max = ct->fast_req[n].fence; 129 + 130 + if (ct->fast_req[n].fence != fence) 131 + continue; 132 + 133 + return fast_req_dump(ct, fence, n); 134 + } 135 + 136 + xe_gt_warn(gt, "Fence 0x%x not found - tracking buffer wrapped? [range = 0x%x -> 0x%x, next = 0x%X]\n", 137 + fence, fence_min, fence_max, ct->fence_seqno); 138 + } 139 + 140 + static void fast_req_track(struct xe_guc_ct *ct, u16 fence, u16 action) 141 + { 142 + unsigned int slot = fence % ARRAY_SIZE(ct->fast_req); 143 + 144 + fast_req_stack_save(ct, slot); 145 + ct->fast_req[slot].fence = fence; 146 + ct->fast_req[slot].action = action; 147 + } 148 + 149 + #define CT_DEAD(ct, ctb, reason_code) ct_dead_capture((ct), (ctb), CT_DEAD_##reason_code) 150 + 151 + #else 152 + 153 + static void ct_dead_fini(struct xe_guc_ct *ct) { } 154 + static void ct_dead_init(struct xe_guc_ct *ct) { } 155 + 156 + static void fast_req_report(struct xe_guc_ct *ct, u16 fence) { } 157 + static void fast_req_track(struct xe_guc_ct *ct, u16 fence, u16 action) { } 158 + 88 159 #define CT_DEAD(ct, ctb, reason) \ 89 160 do { \ 90 161 struct guc_ctb *_ctb = (ctb); \ 91 162 if (_ctb) \ 92 163 _ctb->info.broken = true; \ 93 164 } while (0) 165 + 94 166 #endif 95 167 96 168 /* Used when a CT send wants to block and / or receive data */ ··· 212 110 static bool g2h_fence_needs_alloc(struct g2h_fence *g2h_fence) 213 111 { 214 112 return g2h_fence->seqno == ~0x0; 215 - } 216 - 217 - static struct xe_guc * 218 - ct_to_guc(struct xe_guc_ct *ct) 219 - { 220 - return container_of(ct, struct xe_guc, ct); 221 - } 222 - 223 - static struct xe_gt * 224 - ct_to_gt(struct xe_guc_ct *ct) 225 - { 226 - return container_of(ct, struct xe_gt, uc.guc.ct); 227 - } 228 - 229 - static struct xe_device * 230 - ct_to_xe(struct xe_guc_ct *ct) 231 - { 232 - return gt_to_xe(ct_to_gt(ct)); 233 113 } 234 114 235 115 /** ··· 253 169 #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) 254 170 #define CTB_H2G_BUFFER_OFFSET (CTB_DESC_SIZE * 2) 255 171 #define CTB_H2G_BUFFER_SIZE (SZ_4K) 172 + #define CTB_H2G_BUFFER_DWORDS (CTB_H2G_BUFFER_SIZE / sizeof(u32)) 256 173 #define CTB_G2H_BUFFER_SIZE (SZ_128K) 174 + #define CTB_G2H_BUFFER_DWORDS (CTB_G2H_BUFFER_SIZE / sizeof(u32)) 257 175 #define G2H_ROOM_BUFFER_SIZE (CTB_G2H_BUFFER_SIZE / 2) 176 + #define G2H_ROOM_BUFFER_DWORDS (CTB_G2H_BUFFER_DWORDS / 2) 258 177 259 178 /** 260 179 * xe_guc_ct_queue_proc_time_jiffies - Return maximum time to process a full ··· 286 199 { 287 200 struct xe_guc_ct *ct = arg; 288 201 289 - #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) 290 - cancel_work_sync(&ct->dead.worker); 291 - #endif 202 + ct_dead_fini(ct); 292 203 ct_exit_safe_mode(ct); 293 204 destroy_workqueue(ct->g2h_wq); 294 205 xa_destroy(&ct->fence_lookup); ··· 324 239 xa_init(&ct->fence_lookup); 325 240 INIT_WORK(&ct->g2h_worker, g2h_worker_func); 326 241 INIT_DELAYED_WORK(&ct->safe_mode_worker, safe_mode_worker_func); 327 - #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) 328 - spin_lock_init(&ct->dead.lock); 329 - INIT_WORK(&ct->dead.worker, ct_dead_worker_func); 330 - #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) 331 - stack_depot_init(); 332 - #endif 333 - #endif 242 + 243 + ct_dead_init(ct); 334 244 init_waitqueue_head(&ct->wq); 335 245 init_waitqueue_head(&ct->g2h_fence_wq); 336 246 ··· 406 326 static void guc_ct_ctb_h2g_init(struct xe_device *xe, struct guc_ctb *h2g, 407 327 struct iosys_map *map) 408 328 { 409 - h2g->info.size = CTB_H2G_BUFFER_SIZE / sizeof(u32); 329 + h2g->info.size = CTB_H2G_BUFFER_DWORDS; 410 330 h2g->info.resv_space = 0; 411 331 h2g->info.tail = 0; 412 332 h2g->info.head = 0; ··· 424 344 static void guc_ct_ctb_g2h_init(struct xe_device *xe, struct guc_ctb *g2h, 425 345 struct iosys_map *map) 426 346 { 427 - g2h->info.size = CTB_G2H_BUFFER_SIZE / sizeof(u32); 428 - g2h->info.resv_space = G2H_ROOM_BUFFER_SIZE / sizeof(u32); 347 + g2h->info.size = CTB_G2H_BUFFER_DWORDS; 348 + g2h->info.resv_space = G2H_ROOM_BUFFER_DWORDS; 429 349 g2h->info.head = 0; 430 350 g2h->info.tail = 0; 431 351 g2h->info.space = CIRC_SPACE(g2h->info.tail, g2h->info.head, ··· 720 640 stop_g2h_handler(ct); 721 641 } 722 642 643 + /** 644 + * xe_guc_ct_runtime_suspend() - GuC CT runtime suspend 645 + * @ct: the &xe_guc_ct 646 + * 647 + * Set GuC CT to disabled state. 648 + */ 649 + void xe_guc_ct_runtime_suspend(struct xe_guc_ct *ct) 650 + { 651 + struct guc_ctb *g2h = &ct->ctbs.g2h; 652 + u32 credits = CIRC_SPACE(0, 0, CTB_G2H_BUFFER_DWORDS) - G2H_ROOM_BUFFER_DWORDS; 653 + 654 + /* We should be back to guc_ct_ctb_g2h_init() values */ 655 + xe_gt_assert(ct_to_gt(ct), g2h->info.space == credits); 656 + 657 + /* 658 + * Since we're already in runtime suspend path, we shouldn't have pending 659 + * messages. But if there happen to be any, we'd probably want them to be 660 + * thrown as errors for further investigation. 661 + */ 662 + xe_guc_ct_disable(ct); 663 + } 664 + 665 + /** 666 + * xe_guc_ct_runtime_resume() - GuC CT runtime resume 667 + * @ct: the &xe_guc_ct 668 + * 669 + * Restart GuC CT and set it to enabled state. 670 + */ 671 + void xe_guc_ct_runtime_resume(struct xe_guc_ct *ct) 672 + { 673 + xe_guc_ct_restart(ct); 674 + } 675 + 723 676 static bool h2g_has_room(struct xe_guc_ct *ct, u32 cmd_len) 724 677 { 725 678 struct guc_ctb *h2g = &ct->ctbs.h2g; ··· 859 746 __g2h_release_space(ct, g2h_len); 860 747 spin_unlock_irq(&ct->fast_lock); 861 748 } 862 - 863 - #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) 864 - static void fast_req_track(struct xe_guc_ct *ct, u16 fence, u16 action) 865 - { 866 - unsigned int slot = fence % ARRAY_SIZE(ct->fast_req); 867 - #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) 868 - unsigned long entries[SZ_32]; 869 - unsigned int n; 870 - 871 - n = stack_trace_save(entries, ARRAY_SIZE(entries), 1); 872 - 873 - /* May be called under spinlock, so avoid sleeping */ 874 - ct->fast_req[slot].stack = stack_depot_save(entries, n, GFP_NOWAIT); 875 - #endif 876 - ct->fast_req[slot].fence = fence; 877 - ct->fast_req[slot].action = action; 878 - } 879 - #else 880 - static void fast_req_track(struct xe_guc_ct *ct, u16 fence, u16 action) 881 - { 882 - } 883 - #endif 884 749 885 750 /* 886 751 * The CT protocol accepts a 16 bits fence. This field is fully owned by the ··· 1401 1310 lockdep_assert_held(&ct->lock); 1402 1311 1403 1312 switch (action) { 1313 + case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE: 1404 1314 case XE_GUC_ACTION_SCHED_CONTEXT_MODE_DONE: 1405 1315 case XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE: 1406 1316 case XE_GUC_ACTION_SCHED_ENGINE_MODE_DONE: 1407 1317 case XE_GUC_ACTION_TLB_INVALIDATION_DONE: 1318 + case XE_GUC_ACTION_PAGE_RECLAMATION_DONE: 1408 1319 g2h_release_space(ct, len); 1409 1320 } 1410 1321 ··· 1430 1337 1431 1338 return 0; 1432 1339 } 1433 - 1434 - #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) 1435 - static void fast_req_report(struct xe_guc_ct *ct, u16 fence) 1436 - { 1437 - u16 fence_min = U16_MAX, fence_max = 0; 1438 - struct xe_gt *gt = ct_to_gt(ct); 1439 - bool found = false; 1440 - unsigned int n; 1441 - #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) 1442 - char *buf; 1443 - #endif 1444 - 1445 - lockdep_assert_held(&ct->lock); 1446 - 1447 - for (n = 0; n < ARRAY_SIZE(ct->fast_req); n++) { 1448 - if (ct->fast_req[n].fence < fence_min) 1449 - fence_min = ct->fast_req[n].fence; 1450 - if (ct->fast_req[n].fence > fence_max) 1451 - fence_max = ct->fast_req[n].fence; 1452 - 1453 - if (ct->fast_req[n].fence != fence) 1454 - continue; 1455 - found = true; 1456 - 1457 - #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) 1458 - buf = kmalloc(SZ_4K, GFP_NOWAIT); 1459 - if (buf && stack_depot_snprint(ct->fast_req[n].stack, buf, SZ_4K, 0)) 1460 - xe_gt_err(gt, "Fence 0x%x was used by action %#04x sent at:\n%s", 1461 - fence, ct->fast_req[n].action, buf); 1462 - else 1463 - xe_gt_err(gt, "Fence 0x%x was used by action %#04x [failed to retrieve stack]\n", 1464 - fence, ct->fast_req[n].action); 1465 - kfree(buf); 1466 - #else 1467 - xe_gt_err(gt, "Fence 0x%x was used by action %#04x\n", 1468 - fence, ct->fast_req[n].action); 1469 - #endif 1470 - break; 1471 - } 1472 - 1473 - if (!found) 1474 - xe_gt_warn(gt, "Fence 0x%x not found - tracking buffer wrapped? [range = 0x%x -> 0x%x, next = 0x%X]\n", 1475 - fence, fence_min, fence_max, ct->fence_seqno); 1476 - } 1477 - #else 1478 - static void fast_req_report(struct xe_guc_ct *ct, u16 fence) 1479 - { 1480 - } 1481 - #endif 1482 1340 1483 1341 static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len) 1484 1342 { ··· 1593 1549 ret = xe_guc_pagefault_handler(guc, payload, adj_len); 1594 1550 break; 1595 1551 case XE_GUC_ACTION_TLB_INVALIDATION_DONE: 1552 + case XE_GUC_ACTION_PAGE_RECLAMATION_DONE: 1553 + /* 1554 + * Page reclamation is an extension of TLB invalidation. Both 1555 + * operations share the same seqno and fence. When either 1556 + * action completes, we need to signal the corresponding 1557 + * fence. Since the handling logic (lookup fence by seqno, 1558 + * fence signalling) is identical, we use the same handler 1559 + * for both G2H events. 1560 + */ 1596 1561 ret = xe_guc_tlb_inval_done_handler(guc, payload, adj_len); 1597 1562 break; 1598 1563 case XE_GUC_ACTION_GUC2PF_RELAY_FROM_VF: ··· 1625 1572 ret = xe_guc_g2g_test_notification(guc, payload, adj_len); 1626 1573 break; 1627 1574 #endif 1575 + case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE: 1576 + ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len); 1577 + break; 1578 + case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CGP_CONTEXT_ERROR: 1579 + ret = xe_guc_exec_queue_cgp_context_error_handler(guc, payload, 1580 + adj_len); 1581 + break; 1628 1582 default: 1629 1583 xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action); 1630 1584 } ··· 1774 1714 switch (action) { 1775 1715 case XE_GUC_ACTION_REPORT_PAGE_FAULT_REQ_DESC: 1776 1716 case XE_GUC_ACTION_TLB_INVALIDATION_DONE: 1717 + case XE_GUC_ACTION_PAGE_RECLAMATION_DONE: 1777 1718 break; /* Process these in fast-path */ 1778 1719 default: 1779 1720 return 0; ··· 1811 1750 ret = xe_guc_pagefault_handler(guc, payload, adj_len); 1812 1751 break; 1813 1752 case XE_GUC_ACTION_TLB_INVALIDATION_DONE: 1753 + case XE_GUC_ACTION_PAGE_RECLAMATION_DONE: 1754 + /* 1755 + * Seqno and fence handling of page reclamation and TLB 1756 + * invalidation is identical, so we can use the same handler 1757 + * for both actions. 1758 + */ 1814 1759 __g2h_release_space(ct, len); 1815 1760 ret = xe_guc_tlb_inval_done_handler(guc, payload, adj_len); 1816 1761 break;
+2
drivers/gpu/drm/xe/xe_guc_ct.h
··· 17 17 int xe_guc_ct_enable(struct xe_guc_ct *ct); 18 18 int xe_guc_ct_restart(struct xe_guc_ct *ct); 19 19 void xe_guc_ct_disable(struct xe_guc_ct *ct); 20 + void xe_guc_ct_runtime_resume(struct xe_guc_ct *ct); 21 + void xe_guc_ct_runtime_suspend(struct xe_guc_ct *ct); 20 22 void xe_guc_ct_stop(struct xe_guc_ct *ct); 21 23 void xe_guc_ct_flush_and_stop(struct xe_guc_ct *ct); 22 24 void xe_guc_ct_fast_path(struct xe_guc_ct *ct);
+9 -6
drivers/gpu/drm/xe/xe_guc_debugfs.c
··· 70 70 struct xe_gt *gt = grandparent->d_inode->i_private; 71 71 struct xe_device *xe = gt_to_xe(gt); 72 72 int (*print)(struct xe_guc *, struct drm_printer *) = node->info_ent->data; 73 - int ret; 74 73 75 - xe_pm_runtime_get(xe); 76 - ret = print(&gt->uc.guc, &p); 77 - xe_pm_runtime_put(xe); 78 - 79 - return ret; 74 + guard(xe_pm_runtime)(xe); 75 + return print(&gt->uc.guc, &p); 80 76 } 81 77 82 78 static int guc_log(struct xe_guc *guc, struct drm_printer *p) 83 79 { 84 80 xe_guc_log_print(&guc->log, p); 81 + return 0; 82 + } 83 + 84 + static int guc_log_lfd(struct xe_guc *guc, struct drm_printer *p) 85 + { 86 + xe_guc_log_print_lfd(&guc->log, p); 85 87 return 0; 86 88 } 87 89 ··· 123 121 /* everything else should be added here */ 124 122 static const struct drm_info_list pf_only_debugfs_list[] = { 125 123 { "guc_log", .show = guc_debugfs_show, .data = guc_log }, 124 + { "guc_log_lfd", .show = guc_debugfs_show, .data = guc_log_lfd }, 126 125 { "guc_log_dmesg", .show = guc_debugfs_show, .data = guc_log_dmesg }, 127 126 }; 128 127
+7 -3
drivers/gpu/drm/xe/xe_guc_fwif.h
··· 16 16 #define G2H_LEN_DW_DEREGISTER_CONTEXT 3 17 17 #define G2H_LEN_DW_TLB_INVALIDATE 3 18 18 #define G2H_LEN_DW_G2G_NOTIFY_MIN 3 19 + #define G2H_LEN_DW_MULTI_QUEUE_CONTEXT 3 20 + #define G2H_LEN_DW_PAGE_RECLAMATION 3 19 21 20 22 #define GUC_ID_MAX 65535 21 23 #define GUC_ID_UNKNOWN 0xffffffff ··· 64 62 u32 wq_base_lo; 65 63 u32 wq_base_hi; 66 64 u32 wq_size; 65 + u32 cgp_lo; 66 + u32 cgp_hi; 67 67 u32 hwlrca_lo; 68 68 u32 hwlrca_hi; 69 69 }; ··· 95 91 #define GUC_LOG_NOTIFY_ON_HALF_FULL BIT(1) 96 92 #define GUC_LOG_CAPTURE_ALLOC_UNITS BIT(2) 97 93 #define GUC_LOG_LOG_ALLOC_UNITS BIT(3) 98 - #define GUC_LOG_CRASH REG_GENMASK(5, 4) 99 - #define GUC_LOG_DEBUG REG_GENMASK(9, 6) 100 - #define GUC_LOG_CAPTURE REG_GENMASK(11, 10) 94 + #define GUC_LOG_CRASH_DUMP REG_GENMASK(5, 4) 95 + #define GUC_LOG_EVENT_DATA REG_GENMASK(9, 6) 96 + #define GUC_LOG_STATE_CAPTURE REG_GENMASK(11, 10) 101 97 #define GUC_LOG_BUF_ADDR REG_GENMASK(31, 12) 102 98 103 99 #define GUC_CTL_WA 1
+6
drivers/gpu/drm/xe/xe_guc_klv_thresholds_set_types.h
··· 24 24 * ABI and the associated &NAME, that may be used in code or debugfs/sysfs:: 25 25 * 26 26 * define(TAG, NAME) 27 + * 28 + * If required, KLVs can be labeled with GuC firmware version that added them:: 29 + * 30 + * define(TAG, NAME, MAJOR, MINOR) 31 + * define(TAG, NAME, MAJOR, MINOR, PATCH) 27 32 */ 28 33 #define MAKE_XE_GUC_KLV_THRESHOLDS_SET(define) \ 29 34 define(CAT_ERR, cat_error_count) \ ··· 37 32 define(H2G_STORM, guc_time_us) \ 38 33 define(IRQ_STORM, irq_time_us) \ 39 34 define(DOORBELL_STORM, doorbell_time_us) \ 35 + define(MULTI_LRC_COUNT, multi_lrc_count, 70, 53)\ 40 36 /* end */ 41 37 42 38 /**
+406 -101
drivers/gpu/drm/xe/xe_guc_log.c
··· 7 7 8 8 #include <linux/fault-inject.h> 9 9 10 + #include <linux/utsname.h> 10 11 #include <drm/drm_managed.h> 11 12 13 + #include "abi/guc_lfd_abi.h" 12 14 #include "regs/xe_guc_regs.h" 13 15 #include "xe_bo.h" 14 16 #include "xe_devcoredump.h" ··· 20 18 #include "xe_map.h" 21 19 #include "xe_mmio.h" 22 20 #include "xe_module.h" 21 + 22 + #define GUC_LOG_CHUNK_SIZE SZ_2M 23 + 24 + /* Magic keys define */ 25 + #define GUC_LFD_DRIVER_KEY_STREAMING 0x8086AAAA474C5346 26 + #define GUC_LFD_LOG_BUFFER_MARKER_2 0xDEADFEED 27 + #define GUC_LFD_CRASH_DUMP_BUFFER_MARKER_2 0x8086DEAD 28 + #define GUC_LFD_STATE_CAPTURE_BUFFER_MARKER_2 0xBEEFFEED 29 + #define GUC_LFD_LOG_BUFFER_MARKER_1V2 0xCABBA9E6 30 + #define GUC_LFD_STATE_CAPTURE_BUFFER_MARKER_1V2 0xCABBA9F7 31 + #define GUC_LFD_DATA_HEADER_MAGIC 0x8086 32 + 33 + /* LFD supported LIC type range */ 34 + #define GUC_LIC_TYPE_FIRST GUC_LIC_TYPE_GUC_SW_VERSION 35 + #define GUC_LIC_TYPE_LAST GUC_LIC_TYPE_BUILD_PLATFORM_ID 36 + #define GUC_LFD_TYPE_FW_RANGE_FIRST GUC_LFD_TYPE_FW_VERSION 37 + #define GUC_LFD_TYPE_FW_RANGE_LAST GUC_LFD_TYPE_BUILD_PLATFORM_ID 38 + 39 + #define GUC_LOG_BUFFER_STATE_HEADER_LENGTH 4096 40 + #define GUC_LOG_BUFFER_INIT_CONFIG 3 41 + 42 + struct guc_log_buffer_entry_list { 43 + u32 offset; 44 + u32 rd_ptr; 45 + u32 wr_ptr; 46 + u32 wrap_offset; 47 + u32 buf_size; 48 + }; 49 + 50 + struct guc_lic_save { 51 + u32 version; 52 + /* 53 + * Array of init config KLV values. 54 + * Range from GUC_LOG_LIC_TYPE_FIRST to GUC_LOG_LIC_TYPE_LAST 55 + */ 56 + u32 values[GUC_LIC_TYPE_LAST - GUC_LIC_TYPE_FIRST + 1]; 57 + struct guc_log_buffer_entry_list entry[GUC_LOG_BUFFER_INIT_CONFIG]; 58 + }; 59 + 60 + static struct guc_log_buffer_entry_markers { 61 + u32 key[2]; 62 + } const entry_markers[GUC_LOG_BUFFER_INIT_CONFIG + 1] = { 63 + {{ 64 + GUC_LFD_LOG_BUFFER_MARKER_1V2, 65 + GUC_LFD_LOG_BUFFER_MARKER_2 66 + }}, 67 + {{ 68 + GUC_LFD_LOG_BUFFER_MARKER_1V2, 69 + GUC_LFD_CRASH_DUMP_BUFFER_MARKER_2 70 + }}, 71 + {{ 72 + GUC_LFD_STATE_CAPTURE_BUFFER_MARKER_1V2, 73 + GUC_LFD_STATE_CAPTURE_BUFFER_MARKER_2 74 + }}, 75 + {{ 76 + GUC_LIC_MAGIC, 77 + (FIELD_PREP_CONST(GUC_LIC_VERSION_MASK_MAJOR, GUC_LIC_VERSION_MAJOR) | 78 + FIELD_PREP_CONST(GUC_LIC_VERSION_MASK_MINOR, GUC_LIC_VERSION_MINOR)) 79 + }} 80 + }; 81 + 82 + static struct guc_log_lic_lfd_map { 83 + u32 lic; 84 + u32 lfd; 85 + } const lic_lfd_type_map[] = { 86 + {GUC_LIC_TYPE_GUC_SW_VERSION, GUC_LFD_TYPE_FW_VERSION}, 87 + {GUC_LIC_TYPE_GUC_DEVICE_ID, GUC_LFD_TYPE_GUC_DEVICE_ID}, 88 + {GUC_LIC_TYPE_TSC_FREQUENCY, GUC_LFD_TYPE_TSC_FREQUENCY}, 89 + {GUC_LIC_TYPE_GMD_ID, GUC_LFD_TYPE_GMD_ID}, 90 + {GUC_LIC_TYPE_BUILD_PLATFORM_ID, GUC_LFD_TYPE_BUILD_PLATFORM_ID} 91 + }; 23 92 24 93 static struct xe_guc * 25 94 log_to_guc(struct xe_guc_log *log) ··· 109 36 { 110 37 return gt_to_xe(log_to_gt(log)); 111 38 } 112 - 113 - static size_t guc_log_size(void) 114 - { 115 - /* 116 - * GuC Log buffer Layout 117 - * 118 - * +===============================+ 00B 119 - * | Crash dump state header | 120 - * +-------------------------------+ 32B 121 - * | Debug state header | 122 - * +-------------------------------+ 64B 123 - * | Capture state header | 124 - * +-------------------------------+ 96B 125 - * | | 126 - * +===============================+ PAGE_SIZE (4KB) 127 - * | Crash Dump logs | 128 - * +===============================+ + CRASH_SIZE 129 - * | Debug logs | 130 - * +===============================+ + DEBUG_SIZE 131 - * | Capture logs | 132 - * +===============================+ + CAPTURE_SIZE 133 - */ 134 - return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + 135 - CAPTURE_BUFFER_SIZE; 136 - } 137 - 138 - #define GUC_LOG_CHUNK_SIZE SZ_2M 139 39 140 40 static struct xe_guc_log_snapshot *xe_guc_log_snapshot_alloc(struct xe_guc_log *log, bool atomic) 141 41 { ··· 191 145 struct xe_device *xe = log_to_xe(log); 192 146 struct xe_guc *guc = log_to_guc(log); 193 147 struct xe_gt *gt = log_to_gt(log); 194 - unsigned int fw_ref; 195 148 size_t remain; 196 149 int i; 197 150 ··· 210 165 remain -= size; 211 166 } 212 167 213 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 214 - if (!fw_ref) { 168 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 169 + if (!fw_ref.domains) 215 170 snapshot->stamp = ~0ULL; 216 - } else { 171 + else 217 172 snapshot->stamp = xe_mmio_read64_2x32(&gt->mmio, GUC_PMTIMESTAMP_LO); 218 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 219 - } 173 + 220 174 snapshot->ktime = ktime_get_boottime_ns(); 221 175 snapshot->level = log->level; 222 176 snapshot->ver_found = guc->fw.versions.found[XE_UC_FW_VER_RELEASE]; ··· 260 216 } 261 217 } 262 218 219 + static inline void lfd_output_binary(struct drm_printer *p, char *buf, int buf_size) 220 + { 221 + seq_write(p->arg, buf, buf_size); 222 + } 223 + 224 + static inline int xe_guc_log_add_lfd_header(struct guc_lfd_data *lfd) 225 + { 226 + lfd->header = FIELD_PREP_CONST(GUC_LFD_DATA_HEADER_MASK_MAGIC, GUC_LFD_DATA_HEADER_MAGIC); 227 + return offsetof(struct guc_lfd_data, data); 228 + } 229 + 230 + static int xe_guc_log_add_typed_payload(struct drm_printer *p, u32 type, 231 + u32 data_len, void *data) 232 + { 233 + struct guc_lfd_data lfd; 234 + int len; 235 + 236 + len = xe_guc_log_add_lfd_header(&lfd); 237 + lfd.header |= FIELD_PREP(GUC_LFD_DATA_HEADER_MASK_TYPE, type); 238 + /* make length DW aligned */ 239 + lfd.data_count = DIV_ROUND_UP(data_len, sizeof(u32)); 240 + lfd_output_binary(p, (char *)&lfd, len); 241 + 242 + lfd_output_binary(p, data, data_len); 243 + len += lfd.data_count * sizeof(u32); 244 + 245 + return len; 246 + } 247 + 248 + static inline int lic_type_to_index(u32 lic_type) 249 + { 250 + XE_WARN_ON(lic_type < GUC_LIC_TYPE_FIRST || lic_type > GUC_LIC_TYPE_LAST); 251 + 252 + return lic_type - GUC_LIC_TYPE_FIRST; 253 + } 254 + 255 + static inline int lfd_type_to_index(u32 lfd_type) 256 + { 257 + int i, lic_type = 0; 258 + 259 + XE_WARN_ON(lfd_type < GUC_LFD_TYPE_FW_RANGE_FIRST || lfd_type > GUC_LFD_TYPE_FW_RANGE_LAST); 260 + 261 + for (i = 0; i < ARRAY_SIZE(lic_lfd_type_map); i++) 262 + if (lic_lfd_type_map[i].lfd == lfd_type) 263 + lic_type = lic_lfd_type_map[i].lic; 264 + 265 + /* If not found, lic_type_to_index will warning invalid type */ 266 + return lic_type_to_index(lic_type); 267 + } 268 + 269 + static int xe_guc_log_add_klv(struct drm_printer *p, u32 lfd_type, 270 + struct guc_lic_save *config) 271 + { 272 + int klv_index = lfd_type_to_index(lfd_type); 273 + 274 + return xe_guc_log_add_typed_payload(p, lfd_type, sizeof(u32), &config->values[klv_index]); 275 + } 276 + 277 + static int xe_guc_log_add_os_id(struct drm_printer *p, u32 id) 278 + { 279 + struct guc_lfd_data_os_info os_id; 280 + struct guc_lfd_data lfd; 281 + int len, info_len, section_len; 282 + char *version; 283 + u32 blank = 0; 284 + 285 + len = xe_guc_log_add_lfd_header(&lfd); 286 + lfd.header |= FIELD_PREP(GUC_LFD_DATA_HEADER_MASK_TYPE, GUC_LFD_TYPE_OS_ID); 287 + 288 + os_id.os_id = id; 289 + section_len = offsetof(struct guc_lfd_data_os_info, build_version); 290 + 291 + version = init_utsname()->release; 292 + info_len = strlen(version); 293 + 294 + /* make length DW aligned */ 295 + lfd.data_count = DIV_ROUND_UP(section_len + info_len, sizeof(u32)); 296 + lfd_output_binary(p, (char *)&lfd, len); 297 + lfd_output_binary(p, (char *)&os_id, section_len); 298 + lfd_output_binary(p, version, info_len); 299 + 300 + /* Padding with 0 */ 301 + section_len = lfd.data_count * sizeof(u32) - section_len - info_len; 302 + if (section_len) 303 + lfd_output_binary(p, (char *)&blank, section_len); 304 + 305 + len += lfd.data_count * sizeof(u32); 306 + return len; 307 + } 308 + 309 + static void xe_guc_log_loop_log_init(struct guc_lic *init, struct guc_lic_save *config) 310 + { 311 + struct guc_klv_generic_dw_t *p = (void *)init->data; 312 + int i; 313 + 314 + for (i = 0; i < init->data_count;) { 315 + int klv_len = FIELD_GET(GUC_KLV_0_LEN, p->kl) + 1; 316 + int key = FIELD_GET(GUC_KLV_0_KEY, p->kl); 317 + 318 + if (key < GUC_LIC_TYPE_FIRST || key > GUC_LIC_TYPE_LAST) { 319 + XE_WARN_ON(key < GUC_LIC_TYPE_FIRST || key > GUC_LIC_TYPE_LAST); 320 + break; 321 + } 322 + config->values[lic_type_to_index(key)] = p->value; 323 + i += klv_len + 1; /* Whole KLV structure length in dwords */ 324 + p = (void *)((u32 *)p + klv_len); 325 + } 326 + } 327 + 328 + static int find_marker(u32 mark0, u32 mark1) 329 + { 330 + int i; 331 + 332 + for (i = 0; i < ARRAY_SIZE(entry_markers); i++) 333 + if (mark0 == entry_markers[i].key[0] && mark1 == entry_markers[i].key[1]) 334 + return i; 335 + 336 + return ARRAY_SIZE(entry_markers); 337 + } 338 + 339 + static void xe_guc_log_load_lic(void *guc_log, struct guc_lic_save *config) 340 + { 341 + u32 offset = GUC_LOG_BUFFER_STATE_HEADER_LENGTH; 342 + struct guc_log_buffer_state *p = guc_log; 343 + 344 + config->version = p->version; 345 + while (p->marker[0]) { 346 + int index; 347 + 348 + index = find_marker(p->marker[0], p->marker[1]); 349 + 350 + if (index < ARRAY_SIZE(entry_markers)) { 351 + if (index == GUC_LOG_BUFFER_INIT_CONFIG) { 352 + /* Load log init config */ 353 + xe_guc_log_loop_log_init((void *)p, config); 354 + 355 + /* LIC structure is the last */ 356 + return; 357 + } 358 + config->entry[index].offset = offset; 359 + config->entry[index].rd_ptr = p->read_ptr; 360 + config->entry[index].wr_ptr = p->write_ptr; 361 + config->entry[index].wrap_offset = p->wrap_offset; 362 + config->entry[index].buf_size = p->size; 363 + } 364 + offset += p->size; 365 + p++; 366 + } 367 + } 368 + 369 + static int 370 + xe_guc_log_output_lfd_init(struct drm_printer *p, struct xe_guc_log_snapshot *snapshot, 371 + struct guc_lic_save *config) 372 + { 373 + int type, len; 374 + size_t size = 0; 375 + 376 + /* FW required types */ 377 + for (type = GUC_LFD_TYPE_FW_RANGE_FIRST; type <= GUC_LFD_TYPE_FW_RANGE_LAST; type++) 378 + size += xe_guc_log_add_klv(p, type, config); 379 + 380 + /* KMD required type(s) */ 381 + len = xe_guc_log_add_os_id(p, GUC_LFD_OS_TYPE_OSID_LIN); 382 + size += len; 383 + 384 + return size; 385 + } 386 + 387 + static void 388 + xe_guc_log_print_chunks(struct drm_printer *p, struct xe_guc_log_snapshot *snapshot, 389 + u32 from, u32 to) 390 + { 391 + int chunk_from = from % GUC_LOG_CHUNK_SIZE; 392 + int chunk_id = from / GUC_LOG_CHUNK_SIZE; 393 + int to_chunk_id = to / GUC_LOG_CHUNK_SIZE; 394 + int chunk_to = to % GUC_LOG_CHUNK_SIZE; 395 + int pos = from; 396 + 397 + do { 398 + size_t size = (to_chunk_id == chunk_id ? chunk_to : GUC_LOG_CHUNK_SIZE) - 399 + chunk_from; 400 + 401 + lfd_output_binary(p, snapshot->copy[chunk_id] + chunk_from, size); 402 + pos += size; 403 + chunk_id++; 404 + chunk_from = 0; 405 + } while (pos < to); 406 + } 407 + 408 + static inline int 409 + xe_guc_log_add_log_event(struct drm_printer *p, struct xe_guc_log_snapshot *snapshot, 410 + struct guc_lic_save *config) 411 + { 412 + size_t size; 413 + u32 data_len, section_len; 414 + struct guc_lfd_data lfd; 415 + struct guc_log_buffer_entry_list *entry; 416 + struct guc_lfd_data_log_events_buf events_buf; 417 + 418 + entry = &config->entry[GUC_LOG_TYPE_EVENT_DATA]; 419 + 420 + /* Skip empty log */ 421 + if (entry->rd_ptr == entry->wr_ptr) 422 + return 0; 423 + 424 + size = xe_guc_log_add_lfd_header(&lfd); 425 + lfd.header |= FIELD_PREP(GUC_LFD_DATA_HEADER_MASK_TYPE, GUC_LFD_TYPE_LOG_EVENTS_BUFFER); 426 + events_buf.log_events_format_version = config->version; 427 + 428 + /* Adjust to log_format_buf */ 429 + section_len = offsetof(struct guc_lfd_data_log_events_buf, log_event); 430 + data_len = section_len; 431 + 432 + /* Calculate data length */ 433 + data_len += entry->rd_ptr < entry->wr_ptr ? (entry->wr_ptr - entry->rd_ptr) : 434 + (entry->wr_ptr + entry->wrap_offset - entry->rd_ptr); 435 + /* make length u32 aligned */ 436 + lfd.data_count = DIV_ROUND_UP(data_len, sizeof(u32)); 437 + 438 + /* Output GUC_LFD_TYPE_LOG_EVENTS_BUFFER header */ 439 + lfd_output_binary(p, (char *)&lfd, size); 440 + lfd_output_binary(p, (char *)&events_buf, section_len); 441 + 442 + /* Output data from guc log chunks directly */ 443 + if (entry->rd_ptr < entry->wr_ptr) { 444 + xe_guc_log_print_chunks(p, snapshot, entry->offset + entry->rd_ptr, 445 + entry->offset + entry->wr_ptr); 446 + } else { 447 + /* 1st, print from rd to wrap offset */ 448 + xe_guc_log_print_chunks(p, snapshot, entry->offset + entry->rd_ptr, 449 + entry->offset + entry->wrap_offset); 450 + 451 + /* 2nd, print from buf start to wr */ 452 + xe_guc_log_print_chunks(p, snapshot, entry->offset, entry->offset + entry->wr_ptr); 453 + } 454 + return size; 455 + } 456 + 457 + static int 458 + xe_guc_log_add_crash_dump(struct drm_printer *p, struct xe_guc_log_snapshot *snapshot, 459 + struct guc_lic_save *config) 460 + { 461 + struct guc_log_buffer_entry_list *entry; 462 + int chunk_from, chunk_id; 463 + int from, to, i; 464 + size_t size = 0; 465 + u32 *buf32; 466 + 467 + entry = &config->entry[GUC_LOG_TYPE_CRASH_DUMP]; 468 + 469 + /* Skip zero sized crash dump */ 470 + if (!entry->buf_size) 471 + return 0; 472 + 473 + /* Check if crash dump section are all zero */ 474 + from = entry->offset; 475 + to = entry->offset + entry->buf_size; 476 + chunk_from = from % GUC_LOG_CHUNK_SIZE; 477 + chunk_id = from / GUC_LOG_CHUNK_SIZE; 478 + buf32 = snapshot->copy[chunk_id] + chunk_from; 479 + 480 + for (i = 0; i < entry->buf_size / sizeof(u32); i++) 481 + if (buf32[i]) 482 + break; 483 + 484 + /* Buffer has non-zero data? */ 485 + if (i < entry->buf_size / sizeof(u32)) { 486 + struct guc_lfd_data lfd; 487 + 488 + size = xe_guc_log_add_lfd_header(&lfd); 489 + lfd.header |= FIELD_PREP(GUC_LFD_DATA_HEADER_MASK_TYPE, GUC_LFD_TYPE_FW_CRASH_DUMP); 490 + /* Calculate data length */ 491 + lfd.data_count = DIV_ROUND_UP(entry->buf_size, sizeof(u32)); 492 + /* Output GUC_LFD_TYPE_FW_CRASH_DUMP header */ 493 + lfd_output_binary(p, (char *)&lfd, size); 494 + 495 + /* rd/wr ptr is not used for crash dump */ 496 + xe_guc_log_print_chunks(p, snapshot, from, to); 497 + } 498 + return size; 499 + } 500 + 501 + static void 502 + xe_guc_log_snapshot_print_lfd(struct xe_guc_log_snapshot *snapshot, struct drm_printer *p) 503 + { 504 + struct guc_lfd_file_header header; 505 + struct guc_lic_save config; 506 + size_t size; 507 + 508 + if (!snapshot || !snapshot->size) 509 + return; 510 + 511 + header.magic = GUC_LFD_DRIVER_KEY_STREAMING; 512 + header.version = FIELD_PREP_CONST(GUC_LFD_FILE_HEADER_VERSION_MASK_MINOR, 513 + GUC_LFD_FORMAT_VERSION_MINOR) | 514 + FIELD_PREP_CONST(GUC_LFD_FILE_HEADER_VERSION_MASK_MAJOR, 515 + GUC_LFD_FORMAT_VERSION_MAJOR); 516 + 517 + /* Output LFD file header */ 518 + lfd_output_binary(p, (char *)&header, 519 + offsetof(struct guc_lfd_file_header, stream)); 520 + 521 + /* Output LFD stream */ 522 + xe_guc_log_load_lic(snapshot->copy[0], &config); 523 + size = xe_guc_log_output_lfd_init(p, snapshot, &config); 524 + if (!size) 525 + return; 526 + 527 + xe_guc_log_add_log_event(p, snapshot, &config); 528 + xe_guc_log_add_crash_dump(p, snapshot, &config); 529 + } 530 + 263 531 /** 264 532 * xe_guc_log_print_dmesg - dump a copy of the GuC log to dmesg 265 533 * @log: GuC log structure ··· 607 251 xe_guc_log_snapshot_free(snapshot); 608 252 } 609 253 254 + /** 255 + * xe_guc_log_print_lfd - dump a copy of the GuC log in LFD format 256 + * @log: GuC log structure 257 + * @p: the printer object to output to 258 + */ 259 + void xe_guc_log_print_lfd(struct xe_guc_log *log, struct drm_printer *p) 260 + { 261 + struct xe_guc_log_snapshot *snapshot; 262 + 263 + snapshot = xe_guc_log_snapshot_capture(log, false); 264 + xe_guc_log_snapshot_print_lfd(snapshot, p); 265 + xe_guc_log_snapshot_free(snapshot); 266 + } 267 + 610 268 int xe_guc_log_init(struct xe_guc_log *log) 611 269 { 612 270 struct xe_device *xe = log_to_xe(log); 613 271 struct xe_tile *tile = gt_to_tile(log_to_gt(log)); 614 272 struct xe_bo *bo; 615 273 616 - bo = xe_managed_bo_create_pin_map(xe, tile, guc_log_size(), 274 + bo = xe_managed_bo_create_pin_map(xe, tile, GUC_LOG_SIZE, 617 275 XE_BO_FLAG_SYSTEM | 618 276 XE_BO_FLAG_GGTT | 619 277 XE_BO_FLAG_GGTT_INVALIDATE | ··· 635 265 if (IS_ERR(bo)) 636 266 return PTR_ERR(bo); 637 267 638 - xe_map_memset(xe, &bo->vmap, 0, 0, guc_log_size()); 268 + xe_map_memset(xe, &bo->vmap, 0, 0, xe_bo_size(bo)); 639 269 log->bo = bo; 640 270 log->level = xe_modparam.guc_log_level; 641 271 ··· 643 273 } 644 274 645 275 ALLOW_ERROR_INJECTION(xe_guc_log_init, ERRNO); /* See xe_pci_probe() */ 646 - 647 - static u32 xe_guc_log_section_size_crash(struct xe_guc_log *log) 648 - { 649 - return CRASH_BUFFER_SIZE; 650 - } 651 - 652 - static u32 xe_guc_log_section_size_debug(struct xe_guc_log *log) 653 - { 654 - return DEBUG_BUFFER_SIZE; 655 - } 656 - 657 - /** 658 - * xe_guc_log_section_size_capture - Get capture buffer size within log sections. 659 - * @log: The log object. 660 - * 661 - * This function will return the capture buffer size within log sections. 662 - * 663 - * Return: capture buffer size. 664 - */ 665 - u32 xe_guc_log_section_size_capture(struct xe_guc_log *log) 666 - { 667 - return CAPTURE_BUFFER_SIZE; 668 - } 669 - 670 - /** 671 - * xe_guc_get_log_buffer_size - Get log buffer size for a type. 672 - * @log: The log object. 673 - * @type: The log buffer type 674 - * 675 - * Return: buffer size. 676 - */ 677 - u32 xe_guc_get_log_buffer_size(struct xe_guc_log *log, enum guc_log_buffer_type type) 678 - { 679 - switch (type) { 680 - case GUC_LOG_BUFFER_CRASH_DUMP: 681 - return xe_guc_log_section_size_crash(log); 682 - case GUC_LOG_BUFFER_DEBUG: 683 - return xe_guc_log_section_size_debug(log); 684 - case GUC_LOG_BUFFER_CAPTURE: 685 - return xe_guc_log_section_size_capture(log); 686 - } 687 - return 0; 688 - } 689 - 690 - /** 691 - * xe_guc_get_log_buffer_offset - Get offset in log buffer for a type. 692 - * @log: The log object. 693 - * @type: The log buffer type 694 - * 695 - * This function will return the offset in the log buffer for a type. 696 - * Return: buffer offset. 697 - */ 698 - u32 xe_guc_get_log_buffer_offset(struct xe_guc_log *log, enum guc_log_buffer_type type) 699 - { 700 - enum guc_log_buffer_type i; 701 - u32 offset = PAGE_SIZE;/* for the log_buffer_states */ 702 - 703 - for (i = GUC_LOG_BUFFER_CRASH_DUMP; i < GUC_LOG_BUFFER_TYPE_MAX; ++i) { 704 - if (i == type) 705 - break; 706 - offset += xe_guc_get_log_buffer_size(log, i); 707 - } 708 - 709 - return offset; 710 - } 711 276 712 277 /** 713 278 * xe_guc_check_log_buf_overflow - Check if log buffer overflowed ··· 657 352 * 658 353 * Return: True if overflowed. 659 354 */ 660 - bool xe_guc_check_log_buf_overflow(struct xe_guc_log *log, enum guc_log_buffer_type type, 355 + bool xe_guc_check_log_buf_overflow(struct xe_guc_log *log, enum guc_log_type type, 661 356 unsigned int full_cnt) 662 357 { 663 358 unsigned int prev_full_cnt = log->stats[type].sampled_overflow;
+20 -10
drivers/gpu/drm/xe/xe_guc_log.h
··· 13 13 struct xe_device; 14 14 15 15 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) 16 - #define CRASH_BUFFER_SIZE SZ_1M 17 - #define DEBUG_BUFFER_SIZE SZ_8M 18 - #define CAPTURE_BUFFER_SIZE SZ_2M 16 + #define XE_GUC_LOG_EVENT_DATA_BUFFER_SIZE SZ_8M 17 + #define XE_GUC_LOG_CRASH_DUMP_BUFFER_SIZE SZ_1M 18 + #define XE_GUC_LOG_STATE_CAPTURE_BUFFER_SIZE SZ_2M 19 19 #else 20 - #define CRASH_BUFFER_SIZE SZ_16K 21 - #define DEBUG_BUFFER_SIZE SZ_64K 22 - #define CAPTURE_BUFFER_SIZE SZ_1M 20 + #define XE_GUC_LOG_EVENT_DATA_BUFFER_SIZE SZ_64K 21 + #define XE_GUC_LOG_CRASH_DUMP_BUFFER_SIZE SZ_16K 22 + #define XE_GUC_LOG_STATE_CAPTURE_BUFFER_SIZE SZ_1M 23 23 #endif 24 + 25 + #define GUC_LOG_SIZE (SZ_4K + \ 26 + XE_GUC_LOG_EVENT_DATA_BUFFER_SIZE + \ 27 + XE_GUC_LOG_CRASH_DUMP_BUFFER_SIZE + \ 28 + XE_GUC_LOG_STATE_CAPTURE_BUFFER_SIZE) 29 + 30 + #define XE_GUC_LOG_EVENT_DATA_OFFSET SZ_4K 31 + #define XE_GUC_LOG_CRASH_DUMP_OFFSET (XE_GUC_LOG_EVENT_DATA_OFFSET + \ 32 + XE_GUC_LOG_EVENT_DATA_BUFFER_SIZE) 33 + #define XE_GUC_LOG_STATE_CAPTURE_OFFSET (XE_GUC_LOG_CRASH_DUMP_OFFSET + \ 34 + XE_GUC_LOG_CRASH_DUMP_BUFFER_SIZE) 35 + 24 36 /* 25 37 * While we're using plain log level in i915, GuC controls are much more... 26 38 * "elaborate"? We have a couple of bits for verbosity, separate bit for actual ··· 52 40 53 41 int xe_guc_log_init(struct xe_guc_log *log); 54 42 void xe_guc_log_print(struct xe_guc_log *log, struct drm_printer *p); 43 + void xe_guc_log_print_lfd(struct xe_guc_log *log, struct drm_printer *p); 55 44 void xe_guc_log_print_dmesg(struct xe_guc_log *log); 56 45 struct xe_guc_log_snapshot *xe_guc_log_snapshot_capture(struct xe_guc_log *log, bool atomic); 57 46 void xe_guc_log_snapshot_print(struct xe_guc_log_snapshot *snapshot, struct drm_printer *p); ··· 64 51 return log->level; 65 52 } 66 53 67 - u32 xe_guc_log_section_size_capture(struct xe_guc_log *log); 68 - u32 xe_guc_get_log_buffer_size(struct xe_guc_log *log, enum guc_log_buffer_type type); 69 - u32 xe_guc_get_log_buffer_offset(struct xe_guc_log *log, enum guc_log_buffer_type type); 70 54 bool xe_guc_check_log_buf_overflow(struct xe_guc_log *log, 71 - enum guc_log_buffer_type type, 55 + enum guc_log_type type, 72 56 unsigned int full_cnt); 73 57 74 58 #endif
+19 -47
drivers/gpu/drm/xe/xe_guc_pc.c
··· 76 76 * exposes a programming interface to the host for the control of SLPC. 77 77 * 78 78 * Frequency management: 79 - * ===================== 79 + * --------------------- 80 80 * 81 81 * Xe driver enables SLPC with all of its defaults features and frequency 82 82 * selection, which varies per platform. ··· 87 87 * for any workload. 88 88 * 89 89 * Render-C States: 90 - * ================ 90 + * ---------------- 91 91 * 92 92 * Render-C states is also a GuC PC feature that is now enabled in Xe for 93 93 * all platforms. ··· 499 499 int xe_guc_pc_get_cur_freq(struct xe_guc_pc *pc, u32 *freq) 500 500 { 501 501 struct xe_gt *gt = pc_to_gt(pc); 502 - unsigned int fw_ref; 503 502 504 503 /* 505 504 * GuC SLPC plays with cur freq request when GuCRC is enabled 506 505 * Block RC6 for a more reliable read. 507 506 */ 508 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 509 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FW_GT)) { 510 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 507 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 508 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FW_GT)) 511 509 return -ETIMEDOUT; 512 - } 513 510 514 511 *freq = get_cur_freq(gt); 515 512 516 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 517 513 return 0; 518 514 } 519 515 ··· 1083 1087 */ 1084 1088 int xe_guc_pc_override_gucrc_mode(struct xe_guc_pc *pc, enum slpc_gucrc_mode mode) 1085 1089 { 1086 - int ret; 1087 - 1088 - xe_pm_runtime_get(pc_to_xe(pc)); 1089 - ret = pc_action_set_param(pc, SLPC_PARAM_PWRGATE_RC_MODE, mode); 1090 - xe_pm_runtime_put(pc_to_xe(pc)); 1091 - 1092 - return ret; 1090 + guard(xe_pm_runtime)(pc_to_xe(pc)); 1091 + return pc_action_set_param(pc, SLPC_PARAM_PWRGATE_RC_MODE, mode); 1093 1092 } 1094 1093 1095 1094 /** ··· 1095 1104 */ 1096 1105 int xe_guc_pc_unset_gucrc_mode(struct xe_guc_pc *pc) 1097 1106 { 1098 - int ret; 1099 - 1100 - xe_pm_runtime_get(pc_to_xe(pc)); 1101 - ret = pc_action_unset_param(pc, SLPC_PARAM_PWRGATE_RC_MODE); 1102 - xe_pm_runtime_put(pc_to_xe(pc)); 1103 - 1104 - return ret; 1107 + guard(xe_pm_runtime)(pc_to_xe(pc)); 1108 + return pc_action_unset_param(pc, SLPC_PARAM_PWRGATE_RC_MODE); 1105 1109 } 1106 1110 1107 1111 static void pc_init_pcode_freq(struct xe_guc_pc *pc) ··· 1184 1198 return -EINVAL; 1185 1199 1186 1200 guard(mutex)(&pc->freq_lock); 1187 - xe_pm_runtime_get_noresume(pc_to_xe(pc)); 1201 + guard(xe_pm_runtime_noresume)(pc_to_xe(pc)); 1188 1202 1189 1203 ret = pc_action_set_param(pc, 1190 1204 SLPC_PARAM_POWER_PROFILE, ··· 1194 1208 val, ERR_PTR(ret)); 1195 1209 else 1196 1210 pc->power_profile = val; 1197 - 1198 - xe_pm_runtime_put(pc_to_xe(pc)); 1199 1211 1200 1212 return ret; 1201 1213 } ··· 1207 1223 struct xe_device *xe = pc_to_xe(pc); 1208 1224 struct xe_gt *gt = pc_to_gt(pc); 1209 1225 u32 size = PAGE_ALIGN(sizeof(struct slpc_shared_data)); 1210 - unsigned int fw_ref; 1211 1226 ktime_t earlier; 1212 1227 int ret; 1213 1228 1214 1229 xe_gt_assert(gt, xe_device_uc_enabled(xe)); 1215 1230 1216 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 1217 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FW_GT)) { 1218 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 1231 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 1232 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FW_GT)) 1219 1233 return -ETIMEDOUT; 1220 - } 1221 1234 1222 1235 if (xe->info.skip_guc_pc) { 1223 1236 if (xe->info.platform != XE_PVC) ··· 1222 1241 1223 1242 /* Request max possible since dynamic freq mgmt is not enabled */ 1224 1243 pc_set_cur_freq(pc, UINT_MAX); 1225 - 1226 - ret = 0; 1227 - goto out; 1244 + return 0; 1228 1245 } 1229 1246 1230 1247 xe_map_memset(xe, &pc->bo->vmap, 0, 0, size); ··· 1231 1252 earlier = ktime_get(); 1232 1253 ret = pc_action_reset(pc); 1233 1254 if (ret) 1234 - goto out; 1255 + return ret; 1235 1256 1236 1257 if (wait_for_pc_state(pc, SLPC_GLOBAL_STATE_RUNNING, 1237 1258 SLPC_RESET_TIMEOUT_MS)) { ··· 1242 1263 if (wait_for_pc_state(pc, SLPC_GLOBAL_STATE_RUNNING, 1243 1264 SLPC_RESET_EXTENDED_TIMEOUT_MS)) { 1244 1265 xe_gt_err(gt, "GuC PC Start failed: Dynamic GT frequency control and GT sleep states are now disabled.\n"); 1245 - ret = -EIO; 1246 - goto out; 1266 + return -EIO; 1247 1267 } 1248 1268 1249 1269 xe_gt_warn(gt, "GuC PC excessive start time: %lldms", ··· 1251 1273 1252 1274 ret = pc_init_freqs(pc); 1253 1275 if (ret) 1254 - goto out; 1276 + return ret; 1255 1277 1256 1278 ret = pc_set_mert_freq_cap(pc); 1257 1279 if (ret) 1258 - goto out; 1280 + return ret; 1259 1281 1260 1282 if (xe->info.platform == XE_PVC) { 1261 1283 xe_guc_pc_gucrc_disable(pc); 1262 - ret = 0; 1263 - goto out; 1284 + return 0; 1264 1285 } 1265 1286 1266 1287 ret = pc_action_setup_gucrc(pc, GUCRC_FIRMWARE_CONTROL); 1267 1288 if (ret) 1268 - goto out; 1289 + return ret; 1269 1290 1270 1291 /* Enable SLPC Optimized Strategy for compute */ 1271 1292 ret = pc_action_set_strategy(pc, SLPC_OPTIMIZED_STRATEGY_COMPUTE); ··· 1274 1297 if (unlikely(ret)) 1275 1298 xe_gt_err(gt, "Failed to set SLPC power profile: %pe\n", ERR_PTR(ret)); 1276 1299 1277 - out: 1278 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 1279 1300 return ret; 1280 1301 } 1281 1302 ··· 1305 1330 { 1306 1331 struct xe_guc_pc *pc = arg; 1307 1332 struct xe_device *xe = pc_to_xe(pc); 1308 - unsigned int fw_ref; 1309 1333 1310 1334 if (xe_device_wedged(xe)) 1311 1335 return; 1312 1336 1313 - fw_ref = xe_force_wake_get(gt_to_fw(pc_to_gt(pc)), XE_FORCEWAKE_ALL); 1337 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(pc_to_gt(pc)), XE_FORCEWAKE_ALL); 1314 1338 xe_guc_pc_gucrc_disable(pc); 1315 1339 XE_WARN_ON(xe_guc_pc_stop(pc)); 1316 1340 1317 1341 /* Bind requested freq to mert_freq_cap before unload */ 1318 1342 pc_set_cur_freq(pc, min(pc_max_freq_cap(pc), xe_guc_pc_get_rpe_freq(pc))); 1319 - 1320 - xe_force_wake_put(gt_to_fw(pc_to_gt(pc)), fw_ref); 1321 1343 } 1322 1344 1323 1345 /**
+637 -68
drivers/gpu/drm/xe/xe_guc_submit.c
··· 19 19 #include "abi/guc_klvs_abi.h" 20 20 #include "regs/xe_lrc_layout.h" 21 21 #include "xe_assert.h" 22 + #include "xe_bo.h" 22 23 #include "xe_devcoredump.h" 23 24 #include "xe_device.h" 24 25 #include "xe_exec_queue.h" ··· 48 47 #include "xe_uc_fw.h" 49 48 #include "xe_vm.h" 50 49 50 + #define XE_GUC_EXEC_QUEUE_CGP_CONTEXT_ERROR_LEN 6 51 + 51 52 static struct xe_guc * 52 53 exec_queue_to_guc(struct xe_exec_queue *q) 53 54 { ··· 75 72 #define EXEC_QUEUE_STATE_EXTRA_REF (1 << 11) 76 73 #define EXEC_QUEUE_STATE_PENDING_RESUME (1 << 12) 77 74 #define EXEC_QUEUE_STATE_PENDING_TDR_EXIT (1 << 13) 75 + #define EXEC_QUEUE_STATE_IDLE_SKIP_SUSPEND (1 << 14) 78 76 79 77 static bool exec_queue_registered(struct xe_exec_queue *q) 80 78 { ··· 265 261 static void clear_exec_queue_pending_tdr_exit(struct xe_exec_queue *q) 266 262 { 267 263 atomic_and(~EXEC_QUEUE_STATE_PENDING_TDR_EXIT, &q->guc->state); 264 + } 265 + 266 + static bool exec_queue_idle_skip_suspend(struct xe_exec_queue *q) 267 + { 268 + return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_IDLE_SKIP_SUSPEND; 269 + } 270 + 271 + static void set_exec_queue_idle_skip_suspend(struct xe_exec_queue *q) 272 + { 273 + atomic_or(EXEC_QUEUE_STATE_IDLE_SKIP_SUSPEND, &q->guc->state); 274 + } 275 + 276 + static void clear_exec_queue_idle_skip_suspend(struct xe_exec_queue *q) 277 + { 278 + atomic_and(~EXEC_QUEUE_STATE_IDLE_SKIP_SUSPEND, &q->guc->state); 268 279 } 269 280 270 281 static bool exec_queue_killed_or_banned_or_wedged(struct xe_exec_queue *q) ··· 560 541 u32 slpc_exec_queue_freq_req = 0; 561 542 u32 preempt_timeout_us = q->sched_props.preempt_timeout_us; 562 543 563 - xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q)); 544 + xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q) && 545 + !xe_exec_queue_is_multi_queue_secondary(q)); 564 546 565 547 if (q->flags & EXEC_QUEUE_FLAG_LOW_LATENCY) 566 548 slpc_exec_queue_freq_req |= SLPC_CTX_FREQ_REQ_IS_COMPUTE; ··· 581 561 { 582 562 struct exec_queue_policy policy; 583 563 564 + xe_assert(guc_to_xe(guc), !xe_exec_queue_is_multi_queue_secondary(q)); 565 + 584 566 __guc_exec_queue_policy_start_klv(&policy, q->guc->id); 585 567 __guc_exec_queue_policy_add_preemption_timeout(&policy, 1); 586 568 587 569 xe_guc_ct_send(&guc->ct, (u32 *)&policy.h2g, 588 570 __guc_exec_queue_policy_action_size(&policy), 0, 0); 571 + } 572 + 573 + static bool vf_recovery(struct xe_guc *guc) 574 + { 575 + return xe_gt_recovery_pending(guc_to_gt(guc)); 576 + } 577 + 578 + static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q) 579 + { 580 + struct xe_guc *guc = exec_queue_to_guc(q); 581 + struct xe_device *xe = guc_to_xe(guc); 582 + 583 + /** to wakeup xe_wait_user_fence ioctl if exec queue is reset */ 584 + wake_up_all(&xe->ufence_wq); 585 + 586 + if (xe_exec_queue_is_lr(q)) 587 + queue_work(guc_to_gt(guc)->ordered_wq, &q->guc->lr_tdr); 588 + else 589 + xe_sched_tdr_queue_imm(&q->guc->sched); 590 + } 591 + 592 + static void xe_guc_exec_queue_group_trigger_cleanup(struct xe_exec_queue *q) 593 + { 594 + struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q); 595 + struct xe_exec_queue_group *group = q->multi_queue.group; 596 + struct xe_exec_queue *eq; 597 + 598 + xe_gt_assert(guc_to_gt(exec_queue_to_guc(q)), 599 + xe_exec_queue_is_multi_queue(q)); 600 + 601 + /* Group banned, skip timeout check in TDR */ 602 + WRITE_ONCE(group->banned, true); 603 + xe_guc_exec_queue_trigger_cleanup(primary); 604 + 605 + mutex_lock(&group->list_lock); 606 + list_for_each_entry(eq, &group->list, multi_queue.link) 607 + xe_guc_exec_queue_trigger_cleanup(eq); 608 + mutex_unlock(&group->list_lock); 609 + } 610 + 611 + static void xe_guc_exec_queue_reset_trigger_cleanup(struct xe_exec_queue *q) 612 + { 613 + if (xe_exec_queue_is_multi_queue(q)) { 614 + struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q); 615 + struct xe_exec_queue_group *group = q->multi_queue.group; 616 + struct xe_exec_queue *eq; 617 + 618 + /* Group banned, skip timeout check in TDR */ 619 + WRITE_ONCE(group->banned, true); 620 + 621 + set_exec_queue_reset(primary); 622 + if (!exec_queue_banned(primary) && !exec_queue_check_timeout(primary)) 623 + xe_guc_exec_queue_trigger_cleanup(primary); 624 + 625 + mutex_lock(&group->list_lock); 626 + list_for_each_entry(eq, &group->list, multi_queue.link) { 627 + set_exec_queue_reset(eq); 628 + if (!exec_queue_banned(eq) && !exec_queue_check_timeout(eq)) 629 + xe_guc_exec_queue_trigger_cleanup(eq); 630 + } 631 + mutex_unlock(&group->list_lock); 632 + } else { 633 + set_exec_queue_reset(q); 634 + if (!exec_queue_banned(q) && !exec_queue_check_timeout(q)) 635 + xe_guc_exec_queue_trigger_cleanup(q); 636 + } 637 + } 638 + 639 + static void set_exec_queue_group_banned(struct xe_exec_queue *q) 640 + { 641 + struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q); 642 + struct xe_exec_queue_group *group = q->multi_queue.group; 643 + struct xe_exec_queue *eq; 644 + 645 + /* Ban all queues of the multi-queue group */ 646 + xe_gt_assert(guc_to_gt(exec_queue_to_guc(q)), 647 + xe_exec_queue_is_multi_queue(q)); 648 + set_exec_queue_banned(primary); 649 + 650 + mutex_lock(&group->list_lock); 651 + list_for_each_entry(eq, &group->list, multi_queue.link) 652 + set_exec_queue_banned(eq); 653 + mutex_unlock(&group->list_lock); 589 654 } 590 655 591 656 #define parallel_read(xe_, map_, field_) \ ··· 679 574 #define parallel_write(xe_, map_, field_, val_) \ 680 575 xe_map_wr_field(xe_, &map_, 0, struct guc_submit_parallel_scratch, \ 681 576 field_, val_) 577 + 578 + /** 579 + * DOC: Multi Queue Group GuC interface 580 + * 581 + * The multi queue group coordination between KMD and GuC is through a software 582 + * construct called Context Group Page (CGP). The CGP is a KMD managed 4KB page 583 + * allocated in the global GTT. 584 + * 585 + * CGP format: 586 + * 587 + * +-----------+---------------------------+---------------------------------------------+ 588 + * | DWORD | Name | Description | 589 + * +-----------+---------------------------+---------------------------------------------+ 590 + * | 0 | Version | Bits [15:8]=Major ver, [7:0]=Minor ver | 591 + * +-----------+---------------------------+---------------------------------------------+ 592 + * | 1..15 | RESERVED | MBZ | 593 + * +-----------+---------------------------+---------------------------------------------+ 594 + * | 16 | KMD_QUEUE_UPDATE_MASK_DW0 | KMD queue mask for queues 31..0 | 595 + * +-----------+---------------------------+---------------------------------------------+ 596 + * | 17 | KMD_QUEUE_UPDATE_MASK_DW1 | KMD queue mask for queues 63..32 | 597 + * +-----------+---------------------------+---------------------------------------------+ 598 + * | 18..31 | RESERVED | MBZ | 599 + * +-----------+---------------------------+---------------------------------------------+ 600 + * | 32 | Q0CD_DW0 | Queue 0 context LRC descriptor lower DWORD | 601 + * +-----------+---------------------------+---------------------------------------------+ 602 + * | 33 | Q0ContextIndex | Context ID for Queue 0 | 603 + * +-----------+---------------------------+---------------------------------------------+ 604 + * | 34 | Q1CD_DW0 | Queue 1 context LRC descriptor lower DWORD | 605 + * +-----------+---------------------------+---------------------------------------------+ 606 + * | 35 | Q1ContextIndex | Context ID for Queue 1 | 607 + * +-----------+---------------------------+---------------------------------------------+ 608 + * | ... |... | ... | 609 + * +-----------+---------------------------+---------------------------------------------+ 610 + * | 158 | Q63CD_DW0 | Queue 63 context LRC descriptor lower DWORD | 611 + * +-----------+---------------------------+---------------------------------------------+ 612 + * | 159 | Q63ContextIndex | Context ID for Queue 63 | 613 + * +-----------+---------------------------+---------------------------------------------+ 614 + * | 160..1024 | RESERVED | MBZ | 615 + * +-----------+---------------------------+---------------------------------------------+ 616 + * 617 + * While registering Q0 with GuC, CGP is updated with Q0 entry and GuC is notified 618 + * through XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE H2G message which specifies 619 + * the CGP address. When the secondary queues are added to the group, the CGP is 620 + * updated with entry for that queue and GuC is notified through the H2G interface 621 + * XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC. GuC responds to these H2G messages 622 + * with a XE_GUC_ACTION_NOTIFY_MULTIQ_CONTEXT_CGP_SYNC_DONE G2H message. GuC also 623 + * sends a XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CGP_CONTEXT_ERROR notification for any 624 + * error in the CGP. Only one of these CGP update messages can be outstanding 625 + * (waiting for GuC response) at any time. The bits in KMD_QUEUE_UPDATE_MASK_DW* 626 + * fields indicate which queue entry is being updated in the CGP. 627 + * 628 + * The primary queue (Q0) represents the multi queue group context in GuC and 629 + * submission on any queue of the group must be through Q0 GuC interface only. 630 + * 631 + * As it is not required to register secondary queues with GuC, the secondary queue 632 + * context ids in the CGP are populated with Q0 context id. 633 + */ 634 + 635 + #define CGP_VERSION_MAJOR_SHIFT 8 636 + 637 + static void xe_guc_exec_queue_group_cgp_update(struct xe_device *xe, 638 + struct xe_exec_queue *q) 639 + { 640 + struct xe_exec_queue_group *group = q->multi_queue.group; 641 + u32 guc_id = group->primary->guc->id; 642 + 643 + /* Currently implementing CGP version 1.0 */ 644 + xe_map_wr(xe, &group->cgp_bo->vmap, 0, u32, 645 + 1 << CGP_VERSION_MAJOR_SHIFT); 646 + 647 + xe_map_wr(xe, &group->cgp_bo->vmap, 648 + (32 + q->multi_queue.pos * 2) * sizeof(u32), 649 + u32, lower_32_bits(xe_lrc_descriptor(q->lrc[0]))); 650 + 651 + xe_map_wr(xe, &group->cgp_bo->vmap, 652 + (33 + q->multi_queue.pos * 2) * sizeof(u32), 653 + u32, guc_id); 654 + 655 + if (q->multi_queue.pos / 32) { 656 + xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32), 657 + u32, BIT(q->multi_queue.pos % 32)); 658 + xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32), u32, 0); 659 + } else { 660 + xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32), 661 + u32, BIT(q->multi_queue.pos)); 662 + xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32), u32, 0); 663 + } 664 + } 665 + 666 + static void xe_guc_exec_queue_group_cgp_sync(struct xe_guc *guc, 667 + struct xe_exec_queue *q, 668 + const u32 *action, u32 len) 669 + { 670 + struct xe_exec_queue_group *group = q->multi_queue.group; 671 + struct xe_device *xe = guc_to_xe(guc); 672 + long ret; 673 + 674 + /* 675 + * As all queues of a multi queue group use single drm scheduler 676 + * submit workqueue, CGP synchronization with GuC are serialized. 677 + * Hence, no locking is required here. 678 + * Wait for any pending CGP_SYNC_DONE response before updating the 679 + * CGP page and sending CGP_SYNC message. 680 + * 681 + * FIXME: Support VF migration 682 + */ 683 + ret = wait_event_timeout(guc->ct.wq, 684 + !READ_ONCE(group->sync_pending) || 685 + xe_guc_read_stopped(guc), HZ); 686 + if (!ret || xe_guc_read_stopped(guc)) { 687 + /* CGP_SYNC failed. Reset gt, cleanup the group */ 688 + xe_gt_warn(guc_to_gt(guc), "Wait for CGP_SYNC_DONE response failed!\n"); 689 + set_exec_queue_group_banned(q); 690 + xe_gt_reset_async(q->gt); 691 + xe_guc_exec_queue_group_trigger_cleanup(q); 692 + return; 693 + } 694 + 695 + xe_lrc_set_multi_queue_priority(q->lrc[0], q->multi_queue.priority); 696 + xe_guc_exec_queue_group_cgp_update(xe, q); 697 + 698 + WRITE_ONCE(group->sync_pending, true); 699 + xe_guc_ct_send(&guc->ct, action, len, G2H_LEN_DW_MULTI_QUEUE_CONTEXT, 1); 700 + } 701 + 702 + static void __register_exec_queue_group(struct xe_guc *guc, 703 + struct xe_exec_queue *q, 704 + struct guc_ctxt_registration_info *info) 705 + { 706 + #define MAX_MULTI_QUEUE_REG_SIZE (8) 707 + u32 action[MAX_MULTI_QUEUE_REG_SIZE]; 708 + int len = 0; 709 + 710 + action[len++] = XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE; 711 + action[len++] = info->flags; 712 + action[len++] = info->context_idx; 713 + action[len++] = info->engine_class; 714 + action[len++] = info->engine_submit_mask; 715 + action[len++] = 0; /* Reserved */ 716 + action[len++] = info->cgp_lo; 717 + action[len++] = info->cgp_hi; 718 + 719 + xe_gt_assert(guc_to_gt(guc), len <= MAX_MULTI_QUEUE_REG_SIZE); 720 + #undef MAX_MULTI_QUEUE_REG_SIZE 721 + 722 + /* 723 + * The above XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE do expect a 724 + * XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE response 725 + * from guc. 726 + */ 727 + xe_guc_exec_queue_group_cgp_sync(guc, q, action, len); 728 + } 729 + 730 + static void xe_guc_exec_queue_group_add(struct xe_guc *guc, 731 + struct xe_exec_queue *q) 732 + { 733 + #define MAX_MULTI_QUEUE_CGP_SYNC_SIZE (2) 734 + u32 action[MAX_MULTI_QUEUE_CGP_SYNC_SIZE]; 735 + int len = 0; 736 + 737 + xe_gt_assert(guc_to_gt(guc), xe_exec_queue_is_multi_queue_secondary(q)); 738 + 739 + action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC; 740 + action[len++] = q->multi_queue.group->primary->guc->id; 741 + 742 + xe_gt_assert(guc_to_gt(guc), len <= MAX_MULTI_QUEUE_CGP_SYNC_SIZE); 743 + #undef MAX_MULTI_QUEUE_CGP_SYNC_SIZE 744 + 745 + /* 746 + * The above XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC do expect a 747 + * XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE response 748 + * from guc. 749 + */ 750 + xe_guc_exec_queue_group_cgp_sync(guc, q, action, len); 751 + } 682 752 683 753 static void __register_mlrc_exec_queue(struct xe_guc *guc, 684 754 struct xe_exec_queue *q, ··· 950 670 info.flags = CONTEXT_REGISTRATION_FLAG_KMD | 951 671 FIELD_PREP(CONTEXT_REGISTRATION_FLAG_TYPE, ctx_type); 952 672 673 + if (xe_exec_queue_is_multi_queue(q)) { 674 + struct xe_exec_queue_group *group = q->multi_queue.group; 675 + 676 + info.cgp_lo = xe_bo_ggtt_addr(group->cgp_bo); 677 + info.cgp_hi = 0; 678 + } 679 + 953 680 if (xe_exec_queue_is_parallel(q)) { 954 681 u64 ggtt_addr = xe_lrc_parallel_ggtt_addr(lrc); 955 682 struct iosys_map map = xe_lrc_parallel_map(lrc); ··· 987 700 988 701 set_exec_queue_registered(q); 989 702 trace_xe_exec_queue_register(q); 990 - if (xe_exec_queue_is_parallel(q)) 703 + if (xe_exec_queue_is_multi_queue_primary(q)) 704 + __register_exec_queue_group(guc, q, &info); 705 + else if (xe_exec_queue_is_parallel(q)) 991 706 __register_mlrc_exec_queue(guc, q, &info); 992 - else 707 + else if (!xe_exec_queue_is_multi_queue_secondary(q)) 993 708 __register_exec_queue(guc, &info); 994 - init_policies(guc, q); 709 + 710 + if (!xe_exec_queue_is_multi_queue_secondary(q)) 711 + init_policies(guc, q); 712 + 713 + if (xe_exec_queue_is_multi_queue_secondary(q)) 714 + xe_guc_exec_queue_group_add(guc, q); 995 715 } 996 716 997 717 static u32 wq_space_until_wrap(struct xe_exec_queue *q) 998 718 { 999 719 return (WQ_SIZE - q->guc->wqi_tail); 1000 - } 1001 - 1002 - static bool vf_recovery(struct xe_guc *guc) 1003 - { 1004 - return xe_gt_recovery_pending(guc_to_gt(guc)); 1005 720 } 1006 721 1007 722 static inline void relaxed_ms_sleep(unsigned int delay_ms) ··· 1134 845 if (!job->restore_replay || job->last_replay) { 1135 846 if (xe_exec_queue_is_parallel(q)) 1136 847 wq_item_append(q); 1137 - else 848 + else if (!exec_queue_idle_skip_suspend(q)) 1138 849 xe_lrc_set_ring_tail(lrc, lrc->ring.tail); 1139 850 job->last_replay = false; 1140 851 } 1141 852 1142 853 if (exec_queue_suspended(q) && !xe_exec_queue_is_parallel(q)) 1143 854 return; 855 + 856 + /* 857 + * All queues in a multi-queue group will use the primary queue 858 + * of the group to interface with GuC. 859 + */ 860 + q = xe_exec_queue_multi_queue_primary(q); 1144 861 1145 862 if (!exec_queue_enabled(q) && !exec_queue_suspended(q)) { 1146 863 action[len++] = XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET; ··· 1194 899 trace_xe_sched_job_run(job); 1195 900 1196 901 if (!killed_or_banned_or_wedged && !xe_sched_job_is_error(job)) { 902 + if (xe_exec_queue_is_multi_queue_secondary(q)) { 903 + struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q); 904 + 905 + if (exec_queue_killed_or_banned_or_wedged(primary)) { 906 + killed_or_banned_or_wedged = true; 907 + goto run_job_out; 908 + } 909 + 910 + if (!exec_queue_registered(primary)) 911 + register_exec_queue(primary, GUC_CONTEXT_NORMAL); 912 + } 913 + 1197 914 if (!exec_queue_registered(q)) 1198 915 register_exec_queue(q, GUC_CONTEXT_NORMAL); 1199 916 if (!job->restore_replay) ··· 1214 907 job->restore_replay = false; 1215 908 } 1216 909 910 + run_job_out: 1217 911 /* 1218 912 * We don't care about job-fence ordering in LR VMs because these fences 1219 913 * are never exported; they are used solely to keep jobs on the pending ··· 1240 932 return atomic_read(&guc->submission_state.stopped); 1241 933 } 1242 934 935 + static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc, 936 + struct xe_exec_queue *q, 937 + u32 runnable_state); 938 + static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q); 939 + 1243 940 #define MAKE_SCHED_CONTEXT_ACTION(q, enable_disable) \ 1244 941 u32 action[] = { \ 1245 942 XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET, \ ··· 1258 945 MAKE_SCHED_CONTEXT_ACTION(q, DISABLE); 1259 946 int ret; 1260 947 1261 - set_min_preemption_timeout(guc, q); 948 + if (!xe_exec_queue_is_multi_queue_secondary(q)) 949 + set_min_preemption_timeout(guc, q); 950 + 1262 951 smp_rmb(); 1263 952 ret = wait_event_timeout(guc->ct.wq, 1264 953 (!exec_queue_pending_enable(q) && ··· 1288 973 * Reserve space for both G2H here as the 2nd G2H is sent from a G2H 1289 974 * handler and we are not allowed to reserved G2H space in handlers. 1290 975 */ 1291 - xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 1292 - G2H_LEN_DW_SCHED_CONTEXT_MODE_SET + 1293 - G2H_LEN_DW_DEREGISTER_CONTEXT, 2); 1294 - } 1295 - 1296 - static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q) 1297 - { 1298 - struct xe_guc *guc = exec_queue_to_guc(q); 1299 - struct xe_device *xe = guc_to_xe(guc); 1300 - 1301 - /** to wakeup xe_wait_user_fence ioctl if exec queue is reset */ 1302 - wake_up_all(&xe->ufence_wq); 1303 - 1304 - if (xe_exec_queue_is_lr(q)) 1305 - queue_work(guc_to_gt(guc)->ordered_wq, &q->guc->lr_tdr); 976 + if (xe_exec_queue_is_multi_queue_secondary(q)) 977 + handle_multi_queue_secondary_sched_done(guc, q, 0); 1306 978 else 1307 - xe_sched_tdr_queue_imm(&q->guc->sched); 979 + xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 980 + G2H_LEN_DW_SCHED_CONTEXT_MODE_SET + 981 + G2H_LEN_DW_DEREGISTER_CONTEXT, 2); 1308 982 } 1309 983 1310 984 /** ··· 1485 1181 set_exec_queue_enabled(q); 1486 1182 trace_xe_exec_queue_scheduling_enable(q); 1487 1183 1488 - xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 1489 - G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1); 1184 + if (xe_exec_queue_is_multi_queue_secondary(q)) 1185 + handle_multi_queue_secondary_sched_done(guc, q, 1); 1186 + else 1187 + xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 1188 + G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1); 1490 1189 1491 1190 ret = wait_event_timeout(guc->ct.wq, 1492 1191 !exec_queue_pending_enable(q) || ··· 1513 1206 xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q)); 1514 1207 xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q)); 1515 1208 1516 - if (immediate) 1209 + if (immediate && !xe_exec_queue_is_multi_queue_secondary(q)) 1517 1210 set_min_preemption_timeout(guc, q); 1518 1211 clear_exec_queue_enabled(q); 1519 1212 set_exec_queue_pending_disable(q); 1520 1213 trace_xe_exec_queue_scheduling_disable(q); 1521 1214 1522 - xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 1523 - G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1); 1215 + if (xe_exec_queue_is_multi_queue_secondary(q)) 1216 + handle_multi_queue_secondary_sched_done(guc, q, 0); 1217 + else 1218 + xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 1219 + G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1); 1524 1220 } 1525 1221 1526 1222 static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q) ··· 1541 1231 set_exec_queue_destroyed(q); 1542 1232 trace_xe_exec_queue_deregister(q); 1543 1233 1544 - xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 1545 - G2H_LEN_DW_DEREGISTER_CONTEXT, 1); 1234 + if (xe_exec_queue_is_multi_queue_secondary(q)) 1235 + handle_deregister_done(guc, q); 1236 + else 1237 + xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 1238 + G2H_LEN_DW_DEREGISTER_CONTEXT, 1); 1546 1239 } 1547 1240 1548 1241 static enum drm_gpu_sched_stat ··· 1558 1245 struct xe_guc *guc = exec_queue_to_guc(q); 1559 1246 const char *process_name = "no process"; 1560 1247 struct xe_device *xe = guc_to_xe(guc); 1561 - unsigned int fw_ref; 1562 1248 int err = -ETIME; 1563 1249 pid_t pid = -1; 1564 1250 int i = 0; ··· 1583 1271 exec_queue_killed_or_banned_or_wedged(q) || 1584 1272 exec_queue_destroyed(q); 1585 1273 1274 + /* Skip timeout check if multi-queue group is banned */ 1275 + if (xe_exec_queue_is_multi_queue(q) && 1276 + READ_ONCE(q->multi_queue.group->banned)) 1277 + skip_timeout_check = true; 1278 + 1279 + /* 1280 + * FIXME: In multi-queue scenario, the TDR must ensure that the whole 1281 + * multi-queue group is off the HW before signaling the fences to avoid 1282 + * possible memory corruptions. This means disabling scheduling on the 1283 + * primary queue before or during the secondary queue's TDR. Need to 1284 + * implement this in least obtrusive way. 1285 + */ 1286 + 1586 1287 /* 1587 1288 * If devcoredump not captured and GuC capture for the job is not ready 1588 1289 * do manual capture first and decide later if we need to use it ··· 1603 1278 if (!exec_queue_killed(q) && !xe->devcoredump.captured && 1604 1279 !xe_guc_capture_get_matching_and_lock(q)) { 1605 1280 /* take force wake before engine register manual capture */ 1606 - fw_ref = xe_force_wake_get(gt_to_fw(q->gt), XE_FORCEWAKE_ALL); 1607 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) 1281 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(q->gt), XE_FORCEWAKE_ALL); 1282 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)) 1608 1283 xe_gt_info(q->gt, "failed to get forcewake for coredump capture\n"); 1609 1284 1610 1285 xe_engine_snapshot_capture_for_queue(q); 1611 - 1612 - xe_force_wake_put(gt_to_fw(q->gt), fw_ref); 1613 1286 } 1614 1287 1615 1288 /* ··· 1748 1425 xe_sched_add_pending_job(sched, job); 1749 1426 xe_sched_submission_start(sched); 1750 1427 1751 - xe_guc_exec_queue_trigger_cleanup(q); 1428 + if (xe_exec_queue_is_multi_queue(q)) 1429 + xe_guc_exec_queue_group_trigger_cleanup(q); 1430 + else 1431 + xe_guc_exec_queue_trigger_cleanup(q); 1752 1432 1753 1433 /* Mark all outstanding jobs as bad, thus completing them */ 1754 1434 spin_lock(&sched->base.job_list_lock); ··· 1801 1475 struct xe_exec_queue *q = ge->q; 1802 1476 struct xe_guc *guc = exec_queue_to_guc(q); 1803 1477 1804 - xe_pm_runtime_get(guc_to_xe(guc)); 1478 + guard(xe_pm_runtime)(guc_to_xe(guc)); 1805 1479 trace_xe_exec_queue_destroy(q); 1480 + 1481 + if (xe_exec_queue_is_multi_queue_secondary(q)) { 1482 + struct xe_exec_queue_group *group = q->multi_queue.group; 1483 + 1484 + mutex_lock(&group->list_lock); 1485 + list_del(&q->multi_queue.link); 1486 + mutex_unlock(&group->list_lock); 1487 + } 1806 1488 1807 1489 if (xe_exec_queue_is_lr(q)) 1808 1490 cancel_work_sync(&ge->lr_tdr); ··· 1818 1484 cancel_delayed_work_sync(&ge->sched.base.work_tdr); 1819 1485 1820 1486 xe_exec_queue_fini(q); 1821 - 1822 - xe_pm_runtime_put(guc_to_xe(guc)); 1823 1487 } 1824 1488 1825 1489 static void guc_exec_queue_destroy_async(struct xe_exec_queue *q) ··· 1922 1590 { 1923 1591 struct xe_exec_queue *q = msg->private_data; 1924 1592 struct xe_guc *guc = exec_queue_to_guc(q); 1593 + bool idle_skip_suspend = xe_exec_queue_idle_skip_suspend(q); 1925 1594 1926 - if (guc_exec_queue_allowed_to_change_state(q) && !exec_queue_suspended(q) && 1927 - exec_queue_enabled(q)) { 1595 + if (!idle_skip_suspend && guc_exec_queue_allowed_to_change_state(q) && 1596 + !exec_queue_suspended(q) && exec_queue_enabled(q)) { 1928 1597 wait_event(guc->ct.wq, vf_recovery(guc) || 1929 1598 ((q->guc->resume_time != RESUME_PENDING || 1930 1599 xe_guc_read_stopped(guc)) && !exec_queue_pending_disable(q))); ··· 1944 1611 disable_scheduling(q, false); 1945 1612 } 1946 1613 } else if (q->guc->suspend_pending) { 1614 + if (idle_skip_suspend) 1615 + set_exec_queue_idle_skip_suspend(q); 1947 1616 set_exec_queue_suspended(q); 1948 1617 suspend_fence_signal(q); 1949 1618 } 1619 + } 1620 + 1621 + static void sched_context(struct xe_exec_queue *q) 1622 + { 1623 + struct xe_guc *guc = exec_queue_to_guc(q); 1624 + struct xe_lrc *lrc = q->lrc[0]; 1625 + u32 action[] = { 1626 + XE_GUC_ACTION_SCHED_CONTEXT, 1627 + q->guc->id, 1628 + }; 1629 + 1630 + xe_gt_assert(guc_to_gt(guc), !xe_exec_queue_is_parallel(q)); 1631 + xe_gt_assert(guc_to_gt(guc), !exec_queue_destroyed(q)); 1632 + xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q)); 1633 + xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q)); 1634 + 1635 + trace_xe_exec_queue_submit(q); 1636 + 1637 + xe_lrc_set_ring_tail(lrc, lrc->ring.tail); 1638 + xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0); 1950 1639 } 1951 1640 1952 1641 static void __guc_exec_queue_process_msg_resume(struct xe_sched_msg *msg) ··· 1978 1623 if (guc_exec_queue_allowed_to_change_state(q)) { 1979 1624 clear_exec_queue_suspended(q); 1980 1625 if (!exec_queue_enabled(q)) { 1626 + if (exec_queue_idle_skip_suspend(q)) { 1627 + struct xe_lrc *lrc = q->lrc[0]; 1628 + 1629 + clear_exec_queue_idle_skip_suspend(q); 1630 + xe_lrc_set_ring_tail(lrc, lrc->ring.tail); 1631 + } 1981 1632 q->guc->resume_time = RESUME_PENDING; 1982 1633 set_exec_queue_pending_resume(q); 1983 1634 enable_scheduling(q); 1635 + } else if (exec_queue_idle_skip_suspend(q)) { 1636 + clear_exec_queue_idle_skip_suspend(q); 1637 + sched_context(q); 1984 1638 } 1985 1639 } else { 1986 1640 clear_exec_queue_suspended(q); 1641 + clear_exec_queue_idle_skip_suspend(q); 1987 1642 } 1988 1643 } 1989 1644 1990 - #define CLEANUP 1 /* Non-zero values to catch uninitialized msg */ 1991 - #define SET_SCHED_PROPS 2 1992 - #define SUSPEND 3 1993 - #define RESUME 4 1645 + static void __guc_exec_queue_process_msg_set_multi_queue_priority(struct xe_sched_msg *msg) 1646 + { 1647 + struct xe_exec_queue *q = msg->private_data; 1648 + 1649 + if (guc_exec_queue_allowed_to_change_state(q)) { 1650 + #define MAX_MULTI_QUEUE_CGP_SYNC_SIZE (2) 1651 + struct xe_guc *guc = exec_queue_to_guc(q); 1652 + struct xe_exec_queue_group *group = q->multi_queue.group; 1653 + u32 action[MAX_MULTI_QUEUE_CGP_SYNC_SIZE]; 1654 + int len = 0; 1655 + 1656 + action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC; 1657 + action[len++] = group->primary->guc->id; 1658 + 1659 + xe_gt_assert(guc_to_gt(guc), len <= MAX_MULTI_QUEUE_CGP_SYNC_SIZE); 1660 + #undef MAX_MULTI_QUEUE_CGP_SYNC_SIZE 1661 + 1662 + xe_guc_exec_queue_group_cgp_sync(guc, q, action, len); 1663 + } 1664 + 1665 + kfree(msg); 1666 + } 1667 + 1668 + #define CLEANUP 1 /* Non-zero values to catch uninitialized msg */ 1669 + #define SET_SCHED_PROPS 2 1670 + #define SUSPEND 3 1671 + #define RESUME 4 1672 + #define SET_MULTI_QUEUE_PRIORITY 5 1994 1673 #define OPCODE_MASK 0xf 1995 1674 #define MSG_LOCKED BIT(8) 1996 1675 #define MSG_HEAD BIT(9) ··· 2048 1659 case RESUME: 2049 1660 __guc_exec_queue_process_msg_resume(msg); 2050 1661 break; 1662 + case SET_MULTI_QUEUE_PRIORITY: 1663 + __guc_exec_queue_process_msg_set_multi_queue_priority(msg); 1664 + break; 2051 1665 default: 2052 1666 XE_WARN_ON("Unknown message type"); 2053 1667 } ··· 2072 1680 { 2073 1681 struct xe_gpu_scheduler *sched; 2074 1682 struct xe_guc *guc = exec_queue_to_guc(q); 1683 + struct workqueue_struct *submit_wq = NULL; 2075 1684 struct xe_guc_exec_queue *ge; 2076 1685 long timeout; 2077 1686 int err, i; ··· 2093 1700 2094 1701 timeout = (q->vm && xe_vm_in_lr_mode(q->vm)) ? MAX_SCHEDULE_TIMEOUT : 2095 1702 msecs_to_jiffies(q->sched_props.job_timeout_ms); 1703 + 1704 + /* 1705 + * Use primary queue's submit_wq for all secondary queues of a 1706 + * multi queue group. This serialization avoids any locking around 1707 + * CGP synchronization with GuC. 1708 + */ 1709 + if (xe_exec_queue_is_multi_queue_secondary(q)) { 1710 + struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q); 1711 + 1712 + submit_wq = primary->guc->sched.base.submit_wq; 1713 + } 1714 + 2096 1715 err = xe_sched_init(&ge->sched, &drm_sched_ops, &xe_sched_ops, 2097 - NULL, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64, 1716 + submit_wq, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64, 2098 1717 timeout, guc_to_gt(guc)->ordered_wq, NULL, 2099 1718 q->name, gt_to_xe(q->gt)->drm.dev); 2100 1719 if (err) ··· 2135 1730 2136 1731 xe_exec_queue_assign_name(q, q->guc->id); 2137 1732 2138 - trace_xe_exec_queue_create(q); 1733 + /* 1734 + * Maintain secondary queues of the multi queue group in a list 1735 + * for handling dependencies across the queues in the group. 1736 + */ 1737 + if (xe_exec_queue_is_multi_queue_secondary(q)) { 1738 + struct xe_exec_queue_group *group = q->multi_queue.group; 1739 + 1740 + INIT_LIST_HEAD(&q->multi_queue.link); 1741 + mutex_lock(&group->list_lock); 1742 + list_add_tail(&q->multi_queue.link, &group->list); 1743 + mutex_unlock(&group->list_lock); 1744 + } 1745 + 1746 + if (xe_exec_queue_is_multi_queue(q)) 1747 + trace_xe_exec_queue_create_multi_queue(q); 1748 + else 1749 + trace_xe_exec_queue_create(q); 2139 1750 2140 1751 return 0; 2141 1752 ··· 2283 1862 return 0; 2284 1863 } 2285 1864 1865 + static int guc_exec_queue_set_multi_queue_priority(struct xe_exec_queue *q, 1866 + enum xe_multi_queue_priority priority) 1867 + { 1868 + struct xe_sched_msg *msg; 1869 + 1870 + xe_gt_assert(guc_to_gt(exec_queue_to_guc(q)), xe_exec_queue_is_multi_queue(q)); 1871 + 1872 + if (q->multi_queue.priority == priority || 1873 + exec_queue_killed_or_banned_or_wedged(q)) 1874 + return 0; 1875 + 1876 + msg = kmalloc(sizeof(*msg), GFP_KERNEL); 1877 + if (!msg) 1878 + return -ENOMEM; 1879 + 1880 + q->multi_queue.priority = priority; 1881 + guc_exec_queue_add_msg(q, msg, SET_MULTI_QUEUE_PRIORITY); 1882 + 1883 + return 0; 1884 + } 1885 + 2286 1886 static int guc_exec_queue_suspend(struct xe_exec_queue *q) 2287 1887 { 2288 1888 struct xe_gpu_scheduler *sched = &q->guc->sched; ··· 2378 1936 2379 1937 static bool guc_exec_queue_reset_status(struct xe_exec_queue *q) 2380 1938 { 1939 + if (xe_exec_queue_is_multi_queue_secondary(q) && 1940 + guc_exec_queue_reset_status(xe_exec_queue_multi_queue_primary(q))) 1941 + return true; 1942 + 2381 1943 return exec_queue_reset(q) || exec_queue_killed_or_banned_or_wedged(q); 2382 1944 } 2383 1945 ··· 2399 1953 .set_priority = guc_exec_queue_set_priority, 2400 1954 .set_timeslice = guc_exec_queue_set_timeslice, 2401 1955 .set_preempt_timeout = guc_exec_queue_set_preempt_timeout, 1956 + .set_multi_queue_priority = guc_exec_queue_set_multi_queue_priority, 2402 1957 .suspend = guc_exec_queue_suspend, 2403 1958 .suspend_wait = guc_exec_queue_suspend_wait, 2404 1959 .resume = guc_exec_queue_resume, ··· 2649 2202 struct xe_exec_queue *q; 2650 2203 unsigned long index; 2651 2204 2205 + mutex_lock(&guc->submission_state.lock); 2206 + xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) 2207 + xe_sched_submission_stop(&q->guc->sched); 2208 + mutex_unlock(&guc->submission_state.lock); 2209 + } 2210 + 2211 + /** 2212 + * xe_guc_submit_pause_vf - Stop further runs of submission tasks for VF. 2213 + * @guc: the &xe_guc struct instance whose scheduler is to be disabled 2214 + */ 2215 + void xe_guc_submit_pause_vf(struct xe_guc *guc) 2216 + { 2217 + struct xe_exec_queue *q; 2218 + unsigned long index; 2219 + 2220 + xe_gt_assert(guc_to_gt(guc), IS_SRIOV_VF(guc_to_xe(guc))); 2652 2221 xe_gt_assert(guc_to_gt(guc), vf_recovery(guc)); 2653 2222 2654 2223 mutex_lock(&guc->submission_state.lock); ··· 2756 2293 } 2757 2294 2758 2295 /** 2759 - * xe_guc_submit_unpause_prepare - Prepare unpause submission tasks on given GuC. 2296 + * xe_guc_submit_unpause_prepare_vf - Prepare unpause submission tasks for VF. 2760 2297 * @guc: the &xe_guc struct instance whose scheduler is to be prepared for unpause 2761 2298 */ 2762 - void xe_guc_submit_unpause_prepare(struct xe_guc *guc) 2299 + void xe_guc_submit_unpause_prepare_vf(struct xe_guc *guc) 2763 2300 { 2764 2301 struct xe_exec_queue *q; 2765 2302 unsigned long index; 2766 2303 2304 + xe_gt_assert(guc_to_gt(guc), IS_SRIOV_VF(guc_to_xe(guc))); 2767 2305 xe_gt_assert(guc_to_gt(guc), vf_recovery(guc)); 2768 2306 2769 2307 mutex_lock(&guc->submission_state.lock); ··· 2839 2375 { 2840 2376 struct xe_exec_queue *q; 2841 2377 unsigned long index; 2378 + 2379 + mutex_lock(&guc->submission_state.lock); 2380 + xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) 2381 + xe_sched_submission_start(&q->guc->sched); 2382 + mutex_unlock(&guc->submission_state.lock); 2383 + } 2384 + 2385 + /** 2386 + * xe_guc_submit_unpause_vf - Allow further runs of submission tasks for VF. 2387 + * @guc: the &xe_guc struct instance whose scheduler is to be enabled 2388 + */ 2389 + void xe_guc_submit_unpause_vf(struct xe_guc *guc) 2390 + { 2391 + struct xe_exec_queue *q; 2392 + unsigned long index; 2393 + 2394 + xe_gt_assert(guc_to_gt(guc), IS_SRIOV_VF(guc_to_xe(guc))); 2842 2395 2843 2396 mutex_lock(&guc->submission_state.lock); 2844 2397 xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) { ··· 2933 2452 2934 2453 trace_xe_exec_queue_deregister(q); 2935 2454 2936 - xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action)); 2455 + if (xe_exec_queue_is_multi_queue_secondary(q)) 2456 + handle_deregister_done(guc, q); 2457 + else 2458 + xe_guc_ct_send_g2h_handler(&guc->ct, action, 2459 + ARRAY_SIZE(action)); 2937 2460 } 2938 2461 2939 2462 static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q, ··· 2985 2500 } 2986 2501 } 2987 2502 } 2503 + } 2504 + 2505 + static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc, 2506 + struct xe_exec_queue *q, 2507 + u32 runnable_state) 2508 + { 2509 + /* Take CT lock here as handle_sched_done() do send a h2g message */ 2510 + mutex_lock(&guc->ct.lock); 2511 + handle_sched_done(guc, q, runnable_state); 2512 + mutex_unlock(&guc->ct.lock); 2988 2513 } 2989 2514 2990 2515 int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len) ··· 3080 2585 if (unlikely(!q)) 3081 2586 return -EPROTO; 3082 2587 3083 - xe_gt_info(gt, "Engine reset: engine_class=%s, logical_mask: 0x%x, guc_id=%d", 3084 - xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id); 2588 + xe_gt_info(gt, "Engine reset: engine_class=%s, logical_mask: 0x%x, guc_id=%d, state=0x%0x", 2589 + xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id, 2590 + atomic_read(&q->guc->state)); 3085 2591 3086 2592 trace_xe_exec_queue_reset(q); 3087 2593 ··· 3092 2596 * jobs by setting timeout of the job to the minimum value kicking 3093 2597 * guc_exec_queue_timedout_job. 3094 2598 */ 3095 - set_exec_queue_reset(q); 3096 - if (!exec_queue_banned(q) && !exec_queue_check_timeout(q)) 3097 - xe_guc_exec_queue_trigger_cleanup(q); 2599 + xe_guc_exec_queue_reset_trigger_cleanup(q); 3098 2600 3099 2601 return 0; 3100 2602 } ··· 3160 2666 * See bspec 54047 and 72187 for details. 3161 2667 */ 3162 2668 if (type != XE_GUC_CAT_ERR_TYPE_INVALID) 3163 - xe_gt_dbg(gt, 3164 - "Engine memory CAT error [%u]: class=%s, logical_mask: 0x%x, guc_id=%d", 3165 - type, xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id); 2669 + xe_gt_info(gt, 2670 + "Engine memory CAT error [%u]: class=%s, logical_mask: 0x%x, guc_id=%d", 2671 + type, xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id); 3166 2672 else 3167 - xe_gt_dbg(gt, 3168 - "Engine memory CAT error: class=%s, logical_mask: 0x%x, guc_id=%d", 3169 - xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id); 2673 + xe_gt_info(gt, 2674 + "Engine memory CAT error: class=%s, logical_mask: 0x%x, guc_id=%d", 2675 + xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id); 3170 2676 3171 2677 trace_xe_exec_queue_memory_cat_error(q); 3172 2678 3173 2679 /* Treat the same as engine reset */ 3174 - set_exec_queue_reset(q); 3175 - if (!exec_queue_banned(q) && !exec_queue_check_timeout(q)) 3176 - xe_guc_exec_queue_trigger_cleanup(q); 2680 + xe_guc_exec_queue_reset_trigger_cleanup(q); 3177 2681 3178 2682 return 0; 3179 2683 } ··· 3194 2702 guc_class, instance, reason); 3195 2703 3196 2704 xe_gt_reset_async(gt); 2705 + 2706 + return 0; 2707 + } 2708 + 2709 + int xe_guc_exec_queue_cgp_context_error_handler(struct xe_guc *guc, u32 *msg, 2710 + u32 len) 2711 + { 2712 + struct xe_gt *gt = guc_to_gt(guc); 2713 + struct xe_device *xe = guc_to_xe(guc); 2714 + struct xe_exec_queue *q; 2715 + u32 guc_id = msg[2]; 2716 + 2717 + if (unlikely(len != XE_GUC_EXEC_QUEUE_CGP_CONTEXT_ERROR_LEN)) { 2718 + drm_err(&xe->drm, "Invalid length %u", len); 2719 + return -EPROTO; 2720 + } 2721 + 2722 + q = g2h_exec_queue_lookup(guc, guc_id); 2723 + if (unlikely(!q)) 2724 + return -EPROTO; 2725 + 2726 + xe_gt_dbg(gt, 2727 + "CGP context error: [%s] err=0x%x, q0_id=0x%x LRCA=0x%x guc_id=0x%x", 2728 + msg[0] & 1 ? "uc" : "kmd", msg[1], msg[2], msg[3], msg[4]); 2729 + 2730 + trace_xe_exec_queue_cgp_context_error(q); 2731 + 2732 + /* Treat the same as engine reset */ 2733 + xe_guc_exec_queue_reset_trigger_cleanup(q); 2734 + 2735 + return 0; 2736 + } 2737 + 2738 + /** 2739 + * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler 2740 + * @guc: guc 2741 + * @msg: message indicating CGP sync done 2742 + * @len: length of message 2743 + * 2744 + * Set multi queue group's sync_pending flag to false and wakeup anyone waiting 2745 + * for CGP synchronization to complete. 2746 + * 2747 + * Return: 0 on success, -EPROTO for malformed messages. 2748 + */ 2749 + int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len) 2750 + { 2751 + struct xe_device *xe = guc_to_xe(guc); 2752 + struct xe_exec_queue *q; 2753 + u32 guc_id = msg[0]; 2754 + 2755 + if (unlikely(len < 1)) { 2756 + drm_err(&xe->drm, "Invalid CGP_SYNC_DONE length %u", len); 2757 + return -EPROTO; 2758 + } 2759 + 2760 + q = g2h_exec_queue_lookup(guc, guc_id); 2761 + if (unlikely(!q)) 2762 + return -EPROTO; 2763 + 2764 + if (!xe_exec_queue_is_multi_queue_primary(q)) { 2765 + drm_err(&xe->drm, "Unexpected CGP_SYNC_DONE response"); 2766 + return -EPROTO; 2767 + } 2768 + 2769 + /* Wakeup the serialized cgp update wait */ 2770 + WRITE_ONCE(q->multi_queue.group->sync_pending, false); 2771 + xe_guc_ct_wake_waiters(&guc->ct); 3197 2772 3198 2773 return 0; 3199 2774 } ··· 3364 2805 if (snapshot->parallel_execution) 3365 2806 guc_exec_queue_wq_snapshot_capture(q, snapshot); 3366 2807 2808 + if (xe_exec_queue_is_multi_queue(q)) { 2809 + snapshot->multi_queue.valid = true; 2810 + snapshot->multi_queue.primary = xe_exec_queue_multi_queue_primary(q)->guc->id; 2811 + snapshot->multi_queue.pos = q->multi_queue.pos; 2812 + } 3367 2813 spin_lock(&sched->base.job_list_lock); 3368 2814 snapshot->pending_list_size = list_count_nodes(&sched->base.pending_list); 3369 2815 snapshot->pending_list = kmalloc_array(snapshot->pending_list_size, ··· 3450 2886 3451 2887 if (snapshot->parallel_execution) 3452 2888 guc_exec_queue_wq_snapshot_print(snapshot, p); 2889 + 2890 + if (snapshot->multi_queue.valid) { 2891 + drm_printf(p, "\tMulti queue primary GuC ID: %d\n", snapshot->multi_queue.primary); 2892 + drm_printf(p, "\tMulti queue position: %d\n", snapshot->multi_queue.pos); 2893 + } 3453 2894 3454 2895 for (i = 0; snapshot->pending_list && i < snapshot->pending_list_size; 3455 2896 i++)
+7 -2
drivers/gpu/drm/xe/xe_guc_submit.h
··· 21 21 void xe_guc_submit_stop(struct xe_guc *guc); 22 22 int xe_guc_submit_start(struct xe_guc *guc); 23 23 void xe_guc_submit_pause(struct xe_guc *guc); 24 - void xe_guc_submit_unpause(struct xe_guc *guc); 25 - void xe_guc_submit_unpause_prepare(struct xe_guc *guc); 26 24 void xe_guc_submit_pause_abort(struct xe_guc *guc); 25 + void xe_guc_submit_pause_vf(struct xe_guc *guc); 26 + void xe_guc_submit_unpause(struct xe_guc *guc); 27 + void xe_guc_submit_unpause_vf(struct xe_guc *guc); 28 + void xe_guc_submit_unpause_prepare_vf(struct xe_guc *guc); 27 29 void xe_guc_submit_wedge(struct xe_guc *guc); 28 30 29 31 int xe_guc_read_stopped(struct xe_guc *guc); ··· 36 34 u32 len); 37 35 int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len); 38 36 int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len); 37 + int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len); 38 + int xe_guc_exec_queue_cgp_context_error_handler(struct xe_guc *guc, u32 *msg, 39 + u32 len); 39 40 40 41 struct xe_guc_submit_exec_queue_snapshot * 41 42 xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
+13
drivers/gpu/drm/xe/xe_guc_submit_types.h
··· 135 135 u32 wq[WQ_SIZE / sizeof(u32)]; 136 136 } parallel; 137 137 138 + /** @multi_queue: snapshot of the multi queue information */ 139 + struct { 140 + /** 141 + * @multi_queue.primary: GuC id of the primary exec queue 142 + * of the multi queue group. 143 + */ 144 + u32 primary; 145 + /** @multi_queue.pos: Position of the exec queue within the multi queue group */ 146 + u8 pos; 147 + /** @valid: The exec queue is part of a multi queue group */ 148 + bool valid; 149 + } multi_queue; 150 + 138 151 /** @pending_list_size: Size of the pending list snapshot array */ 139 152 int pending_list_size; 140 153 /** @pending_list: snapshot of the pending list info */
+31 -10
drivers/gpu/drm/xe/xe_guc_tlb_inval.c
··· 13 13 #include "xe_guc_tlb_inval.h" 14 14 #include "xe_force_wake.h" 15 15 #include "xe_mmio.h" 16 + #include "xe_sa.h" 16 17 #include "xe_tlb_inval.h" 17 18 18 19 #include "regs/xe_guc_regs.h" ··· 35 34 G2H_LEN_DW_TLB_INVALIDATE, 1); 36 35 } 37 36 38 - #define MAKE_INVAL_OP(type) ((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \ 37 + #define MAKE_INVAL_OP_FLUSH(type, flush_cache) ((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \ 39 38 XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \ 40 - XE_GUC_TLB_INVAL_FLUSH_CACHE) 39 + (flush_cache ? \ 40 + XE_GUC_TLB_INVAL_FLUSH_CACHE : 0)) 41 + 42 + #define MAKE_INVAL_OP(type) MAKE_INVAL_OP_FLUSH(type, true) 41 43 42 44 static int send_tlb_inval_all(struct xe_tlb_inval *tlb_inval, u32 seqno) 43 45 { ··· 75 71 return send_tlb_inval(guc, action, ARRAY_SIZE(action)); 76 72 } else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) { 77 73 struct xe_mmio *mmio = &gt->mmio; 78 - unsigned int fw_ref; 79 74 80 75 if (IS_SRIOV_VF(xe)) 81 76 return -ECANCELED; 82 77 83 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 78 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 84 79 if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) { 85 80 xe_mmio_write32(mmio, PVC_GUC_TLB_INV_DESC1, 86 81 PVC_GUC_TLB_INV_DESC1_INVALIDATE); ··· 89 86 xe_mmio_write32(mmio, GUC_TLB_INV_CR, 90 87 GUC_TLB_INV_CR_INVALIDATE); 91 88 } 92 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 93 89 } 94 90 95 91 return -ECANCELED; 92 + } 93 + 94 + static int send_page_reclaim(struct xe_guc *guc, u32 seqno, 95 + u64 gpu_addr) 96 + { 97 + u32 action[] = { 98 + XE_GUC_ACTION_PAGE_RECLAMATION, 99 + seqno, 100 + lower_32_bits(gpu_addr), 101 + upper_32_bits(gpu_addr), 102 + }; 103 + 104 + return xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 105 + G2H_LEN_DW_PAGE_RECLAMATION, 1); 96 106 } 97 107 98 108 /* ··· 116 100 #define MAX_RANGE_TLB_INVALIDATION_LENGTH (rounddown_pow_of_two(ULONG_MAX)) 117 101 118 102 static int send_tlb_inval_ppgtt(struct xe_tlb_inval *tlb_inval, u32 seqno, 119 - u64 start, u64 end, u32 asid) 103 + u64 start, u64 end, u32 asid, 104 + struct drm_suballoc *prl_sa) 120 105 { 121 106 #define MAX_TLB_INVALIDATION_LEN 7 122 107 struct xe_guc *guc = tlb_inval->private; 123 108 struct xe_gt *gt = guc_to_gt(guc); 124 109 u32 action[MAX_TLB_INVALIDATION_LEN]; 125 110 u64 length = end - start; 126 - int len = 0; 111 + int len = 0, err; 127 112 128 113 if (guc_to_xe(guc)->info.force_execlist) 129 114 return -ECANCELED; 130 115 131 116 action[len++] = XE_GUC_ACTION_TLB_INVALIDATION; 132 - action[len++] = seqno; 117 + action[len++] = !prl_sa ? seqno : TLB_INVALIDATION_SEQNO_INVALID; 133 118 if (!gt_to_xe(gt)->info.has_range_tlb_inval || 134 119 length > MAX_RANGE_TLB_INVALIDATION_LENGTH) { 135 120 action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL); ··· 171 154 ilog2(SZ_2M) + 1))); 172 155 xe_gt_assert(gt, IS_ALIGNED(start, length)); 173 156 174 - action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE); 157 + /* Flush on NULL case, Media is not required to modify flush due to no PPC so NOP */ 158 + action[len++] = MAKE_INVAL_OP_FLUSH(XE_GUC_TLB_INVAL_PAGE_SELECTIVE, !prl_sa); 175 159 action[len++] = asid; 176 160 action[len++] = lower_32_bits(start); 177 161 action[len++] = upper_32_bits(start); ··· 181 163 182 164 xe_gt_assert(gt, len <= MAX_TLB_INVALIDATION_LEN); 183 165 184 - return send_tlb_inval(guc, action, len); 166 + err = send_tlb_inval(guc, action, len); 167 + if (!err && prl_sa) 168 + err = send_page_reclaim(guc, seqno, xe_sa_bo_gpu_addr(prl_sa)); 169 + return err; 185 170 } 186 171 187 172 static bool tlb_inval_initialized(struct xe_tlb_inval *tlb_inval)
+2 -5
drivers/gpu/drm/xe/xe_huc.c
··· 300 300 void xe_huc_print_info(struct xe_huc *huc, struct drm_printer *p) 301 301 { 302 302 struct xe_gt *gt = huc_to_gt(huc); 303 - unsigned int fw_ref; 304 303 305 304 xe_uc_fw_print(&huc->fw, p); 306 305 307 306 if (!xe_uc_fw_is_enabled(&huc->fw)) 308 307 return; 309 308 310 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 311 - if (!fw_ref) 309 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 310 + if (!fw_ref.domains) 312 311 return; 313 312 314 313 drm_printf(p, "\nHuC status: 0x%08x\n", 315 314 xe_mmio_read32(&gt->mmio, HUC_KERNEL_LOAD_INFO)); 316 - 317 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 318 315 }
+1 -2
drivers/gpu/drm/xe/xe_huc_debugfs.c
··· 37 37 struct xe_device *xe = huc_to_xe(huc); 38 38 struct drm_printer p = drm_seq_file_printer(m); 39 39 40 - xe_pm_runtime_get(xe); 40 + guard(xe_pm_runtime)(xe); 41 41 xe_huc_print_info(huc, &p); 42 - xe_pm_runtime_put(xe); 43 42 44 43 return 0; 45 44 }
+6 -10
drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c
··· 43 43 { 44 44 struct xe_device *xe = kobj_to_xe(kobj); 45 45 struct kobj_attribute *kattr; 46 - ssize_t ret = -EIO; 47 46 48 47 kattr = container_of(attr, struct kobj_attribute, attr); 49 48 if (kattr->show) { 50 - xe_pm_runtime_get(xe); 51 - ret = kattr->show(kobj, kattr, buf); 52 - xe_pm_runtime_put(xe); 49 + guard(xe_pm_runtime)(xe); 50 + return kattr->show(kobj, kattr, buf); 53 51 } 54 52 55 - return ret; 53 + return -EIO; 56 54 } 57 55 58 56 static ssize_t xe_hw_engine_class_sysfs_attr_store(struct kobject *kobj, ··· 60 62 { 61 63 struct xe_device *xe = kobj_to_xe(kobj); 62 64 struct kobj_attribute *kattr; 63 - ssize_t ret = -EIO; 64 65 65 66 kattr = container_of(attr, struct kobj_attribute, attr); 66 67 if (kattr->store) { 67 - xe_pm_runtime_get(xe); 68 - ret = kattr->store(kobj, kattr, buf, count); 69 - xe_pm_runtime_put(xe); 68 + guard(xe_pm_runtime)(xe); 69 + return kattr->store(kobj, kattr, buf, count); 70 70 } 71 71 72 - return ret; 72 + return -EIO; 73 73 } 74 74 75 75 static const struct sysfs_ops xe_hw_engine_class_sysfs_ops = {
+71 -8
drivers/gpu/drm/xe/xe_hw_engine_group.c
··· 9 9 #include "xe_device.h" 10 10 #include "xe_exec_queue.h" 11 11 #include "xe_gt.h" 12 + #include "xe_gt_stats.h" 12 13 #include "xe_hw_engine_group.h" 14 + #include "xe_sync.h" 13 15 #include "xe_vm.h" 14 16 15 17 static void ··· 22 20 int err; 23 21 enum xe_hw_engine_group_execution_mode previous_mode; 24 22 25 - err = xe_hw_engine_group_get_mode(group, EXEC_MODE_LR, &previous_mode); 23 + err = xe_hw_engine_group_get_mode(group, EXEC_MODE_LR, &previous_mode, 24 + NULL, 0); 26 25 if (err) 27 26 return; 28 27 ··· 191 188 /** 192 189 * xe_hw_engine_group_suspend_faulting_lr_jobs() - Suspend the faulting LR jobs of this group 193 190 * @group: The hw engine group 191 + * @has_deps: dma-fence job triggering suspend has dependencies 194 192 * 195 193 * Return: 0 on success, negative error code on error. 196 194 */ 197 - static int xe_hw_engine_group_suspend_faulting_lr_jobs(struct xe_hw_engine_group *group) 195 + static int xe_hw_engine_group_suspend_faulting_lr_jobs(struct xe_hw_engine_group *group, 196 + bool has_deps) 198 197 { 199 198 int err; 200 199 struct xe_exec_queue *q; 200 + struct xe_gt *gt = NULL; 201 201 bool need_resume = false; 202 + ktime_t start = xe_gt_stats_ktime_get(); 202 203 203 204 lockdep_assert_held_write(&group->mode_sem); 204 205 205 206 list_for_each_entry(q, &group->exec_queue_list, hw_engine_group_link) { 207 + bool idle_skip_suspend; 208 + 206 209 if (!xe_vm_in_fault_mode(q->vm)) 207 210 continue; 208 211 209 - need_resume = true; 212 + idle_skip_suspend = xe_exec_queue_idle_skip_suspend(q); 213 + if (!idle_skip_suspend && has_deps) 214 + return -EAGAIN; 215 + 216 + xe_gt_stats_incr(q->gt, XE_GT_STATS_ID_HW_ENGINE_GROUP_SUSPEND_LR_QUEUE_COUNT, 1); 217 + if (idle_skip_suspend) 218 + xe_gt_stats_incr(q->gt, 219 + XE_GT_STATS_ID_HW_ENGINE_GROUP_SKIP_LR_QUEUE_COUNT, 1); 220 + 221 + need_resume |= !idle_skip_suspend; 210 222 q->ops->suspend(q); 223 + gt = q->gt; 211 224 } 212 225 213 226 list_for_each_entry(q, &group->exec_queue_list, hw_engine_group_link) { ··· 233 214 err = q->ops->suspend_wait(q); 234 215 if (err) 235 216 return err; 217 + } 218 + 219 + if (gt) { 220 + xe_gt_stats_incr(gt, 221 + XE_GT_STATS_ID_HW_ENGINE_GROUP_SUSPEND_LR_QUEUE_US, 222 + xe_gt_stats_ktime_us_delta(start)); 236 223 } 237 224 238 225 if (need_resume) ··· 261 236 { 262 237 long timeout; 263 238 struct xe_exec_queue *q; 239 + struct xe_gt *gt = NULL; 264 240 struct dma_fence *fence; 241 + ktime_t start = xe_gt_stats_ktime_get(); 265 242 266 243 lockdep_assert_held_write(&group->mode_sem); 267 244 ··· 271 244 if (xe_vm_in_lr_mode(q->vm)) 272 245 continue; 273 246 247 + xe_gt_stats_incr(q->gt, XE_GT_STATS_ID_HW_ENGINE_GROUP_WAIT_DMA_QUEUE_COUNT, 1); 274 248 fence = xe_exec_queue_last_fence_get_for_resume(q, q->vm); 275 249 timeout = dma_fence_wait(fence, false); 276 250 dma_fence_put(fence); 251 + gt = q->gt; 277 252 278 253 if (timeout < 0) 279 254 return -ETIME; 280 255 } 281 256 257 + if (gt) { 258 + xe_gt_stats_incr(gt, 259 + XE_GT_STATS_ID_HW_ENGINE_GROUP_WAIT_DMA_QUEUE_US, 260 + xe_gt_stats_ktime_us_delta(start)); 261 + } 262 + 282 263 return 0; 283 264 } 284 265 285 - static int switch_mode(struct xe_hw_engine_group *group) 266 + static int switch_mode(struct xe_hw_engine_group *group, bool has_deps) 286 267 { 287 268 int err = 0; 288 269 enum xe_hw_engine_group_execution_mode new_mode; ··· 300 265 switch (group->cur_mode) { 301 266 case EXEC_MODE_LR: 302 267 new_mode = EXEC_MODE_DMA_FENCE; 303 - err = xe_hw_engine_group_suspend_faulting_lr_jobs(group); 268 + err = xe_hw_engine_group_suspend_faulting_lr_jobs(group, 269 + has_deps); 304 270 break; 305 271 case EXEC_MODE_DMA_FENCE: 306 272 new_mode = EXEC_MODE_LR; ··· 317 281 return 0; 318 282 } 319 283 284 + static int wait_syncs(struct xe_sync_entry *syncs, int num_syncs) 285 + { 286 + int err, i; 287 + 288 + for (i = 0; i < num_syncs; ++i) { 289 + err = xe_sync_entry_wait(syncs + i); 290 + if (err) 291 + return err; 292 + } 293 + 294 + return 0; 295 + } 296 + 320 297 /** 321 298 * xe_hw_engine_group_get_mode() - Get the group to execute in the new mode 322 299 * @group: The hw engine group 323 300 * @new_mode: The new execution mode 324 301 * @previous_mode: Pointer to the previous mode provided for use by caller 302 + * @syncs: Syncs from exec IOCTL 303 + * @num_syncs: Number of syncs from exec IOCTL 325 304 * 326 305 * Return: 0 if successful, -EINTR if locking failed. 327 306 */ 328 307 int xe_hw_engine_group_get_mode(struct xe_hw_engine_group *group, 329 308 enum xe_hw_engine_group_execution_mode new_mode, 330 - enum xe_hw_engine_group_execution_mode *previous_mode) 309 + enum xe_hw_engine_group_execution_mode *previous_mode, 310 + struct xe_sync_entry *syncs, int num_syncs) 331 311 __acquires(&group->mode_sem) 332 312 { 313 + bool has_deps = !!num_syncs; 333 314 int err = down_read_interruptible(&group->mode_sem); 334 315 335 316 if (err) ··· 356 303 357 304 if (new_mode != group->cur_mode) { 358 305 up_read(&group->mode_sem); 306 + retry: 359 307 err = down_write_killable(&group->mode_sem); 360 308 if (err) 361 309 return err; 362 310 363 311 if (new_mode != group->cur_mode) { 364 - err = switch_mode(group); 312 + err = switch_mode(group, has_deps); 365 313 if (err) { 366 314 up_write(&group->mode_sem); 367 - return err; 315 + 316 + if (err != -EAGAIN) 317 + return err; 318 + 319 + err = wait_syncs(syncs, num_syncs); 320 + if (err) 321 + return err; 322 + 323 + has_deps = false; 324 + goto retry; 368 325 } 369 326 } 370 327 downgrade_write(&group->mode_sem);
+3 -1
drivers/gpu/drm/xe/xe_hw_engine_group.h
··· 11 11 struct drm_device; 12 12 struct xe_exec_queue; 13 13 struct xe_gt; 14 + struct xe_sync_entry; 14 15 15 16 int xe_hw_engine_setup_groups(struct xe_gt *gt); 16 17 ··· 20 19 21 20 int xe_hw_engine_group_get_mode(struct xe_hw_engine_group *group, 22 21 enum xe_hw_engine_group_execution_mode new_mode, 23 - enum xe_hw_engine_group_execution_mode *previous_mode); 22 + enum xe_hw_engine_group_execution_mode *previous_mode, 23 + struct xe_sync_entry *syncs, int num_syncs); 24 24 void xe_hw_engine_group_put(struct xe_hw_engine_group *group); 25 25 26 26 enum xe_hw_engine_group_execution_mode
+14 -38
drivers/gpu/drm/xe/xe_hwmon.c
··· 502 502 503 503 int ret = 0; 504 504 505 - xe_pm_runtime_get(hwmon->xe); 505 + guard(xe_pm_runtime)(hwmon->xe); 506 506 507 507 mutex_lock(&hwmon->hwmon_lock); 508 508 ··· 520 520 } 521 521 522 522 mutex_unlock(&hwmon->hwmon_lock); 523 - 524 - xe_pm_runtime_put(hwmon->xe); 525 523 526 524 x = REG_FIELD_GET(PWR_LIM_TIME_X, reg_val); 527 525 y = REG_FIELD_GET(PWR_LIM_TIME_Y, reg_val); ··· 602 604 rxy = REG_FIELD_PREP(PWR_LIM_TIME_X, x) | 603 605 REG_FIELD_PREP(PWR_LIM_TIME_Y, y); 604 606 605 - xe_pm_runtime_get(hwmon->xe); 607 + guard(xe_pm_runtime)(hwmon->xe); 606 608 607 609 mutex_lock(&hwmon->hwmon_lock); 608 610 ··· 613 615 PWR_LIM_TIME, rxy); 614 616 615 617 mutex_unlock(&hwmon->hwmon_lock); 616 - 617 - xe_pm_runtime_put(hwmon->xe); 618 618 619 619 return count; 620 620 } ··· 1120 1124 int channel, long *val) 1121 1125 { 1122 1126 struct xe_hwmon *hwmon = dev_get_drvdata(dev); 1123 - int ret; 1124 1127 1125 - xe_pm_runtime_get(hwmon->xe); 1128 + guard(xe_pm_runtime)(hwmon->xe); 1126 1129 1127 1130 switch (type) { 1128 1131 case hwmon_temp: 1129 - ret = xe_hwmon_temp_read(hwmon, attr, channel, val); 1130 - break; 1132 + return xe_hwmon_temp_read(hwmon, attr, channel, val); 1131 1133 case hwmon_power: 1132 - ret = xe_hwmon_power_read(hwmon, attr, channel, val); 1133 - break; 1134 + return xe_hwmon_power_read(hwmon, attr, channel, val); 1134 1135 case hwmon_curr: 1135 - ret = xe_hwmon_curr_read(hwmon, attr, channel, val); 1136 - break; 1136 + return xe_hwmon_curr_read(hwmon, attr, channel, val); 1137 1137 case hwmon_in: 1138 - ret = xe_hwmon_in_read(hwmon, attr, channel, val); 1139 - break; 1138 + return xe_hwmon_in_read(hwmon, attr, channel, val); 1140 1139 case hwmon_energy: 1141 - ret = xe_hwmon_energy_read(hwmon, attr, channel, val); 1142 - break; 1140 + return xe_hwmon_energy_read(hwmon, attr, channel, val); 1143 1141 case hwmon_fan: 1144 - ret = xe_hwmon_fan_read(hwmon, attr, channel, val); 1145 - break; 1142 + return xe_hwmon_fan_read(hwmon, attr, channel, val); 1146 1143 default: 1147 - ret = -EOPNOTSUPP; 1148 - break; 1144 + return -EOPNOTSUPP; 1149 1145 } 1150 - 1151 - xe_pm_runtime_put(hwmon->xe); 1152 - 1153 - return ret; 1154 1146 } 1155 1147 1156 1148 static int ··· 1146 1162 int channel, long val) 1147 1163 { 1148 1164 struct xe_hwmon *hwmon = dev_get_drvdata(dev); 1149 - int ret; 1150 1165 1151 - xe_pm_runtime_get(hwmon->xe); 1166 + guard(xe_pm_runtime)(hwmon->xe); 1152 1167 1153 1168 switch (type) { 1154 1169 case hwmon_power: 1155 - ret = xe_hwmon_power_write(hwmon, attr, channel, val); 1156 - break; 1170 + return xe_hwmon_power_write(hwmon, attr, channel, val); 1157 1171 case hwmon_curr: 1158 - ret = xe_hwmon_curr_write(hwmon, attr, channel, val); 1159 - break; 1172 + return xe_hwmon_curr_write(hwmon, attr, channel, val); 1160 1173 default: 1161 - ret = -EOPNOTSUPP; 1162 - break; 1174 + return -EOPNOTSUPP; 1163 1175 } 1164 - 1165 - xe_pm_runtime_put(hwmon->xe); 1166 - 1167 - return ret; 1168 1176 } 1169 1177 1170 1178 static int xe_hwmon_read_label(struct device *dev,
+1 -1
drivers/gpu/drm/xe/xe_i2c.c
··· 319 319 struct xe_i2c *i2c; 320 320 int ret; 321 321 322 - if (xe->info.platform != XE_BATTLEMAGE) 322 + if (!xe->info.has_i2c) 323 323 return 0; 324 324 325 325 if (IS_SRIOV_VF(xe))
+2
drivers/gpu/drm/xe/xe_irq.c
··· 21 21 #include "xe_hw_error.h" 22 22 #include "xe_i2c.h" 23 23 #include "xe_memirq.h" 24 + #include "xe_mert.h" 24 25 #include "xe_mmio.h" 25 26 #include "xe_pxp.h" 26 27 #include "xe_sriov.h" ··· 526 525 xe_heci_csc_irq_handler(xe, master_ctl); 527 526 xe_display_irq_handler(xe, master_ctl); 528 527 xe_i2c_irq_handler(xe, master_ctl); 528 + xe_mert_irq_handler(xe, master_ctl); 529 529 gu_misc_iir = gu_misc_irq_ack(xe, master_ctl); 530 530 } 531 531 }
+22 -3
drivers/gpu/drm/xe/xe_lmtt.c
··· 8 8 #include <drm/drm_managed.h> 9 9 10 10 #include "regs/xe_gt_regs.h" 11 + #include "regs/xe_mert_regs.h" 11 12 12 13 #include "xe_assert.h" 13 14 #include "xe_bo.h" 14 15 #include "xe_tlb_inval.h" 15 16 #include "xe_lmtt.h" 16 17 #include "xe_map.h" 18 + #include "xe_mert.h" 17 19 #include "xe_mmio.h" 18 20 #include "xe_res_cursor.h" 19 21 #include "xe_sriov.h" 22 + #include "xe_tile.h" 20 23 #include "xe_tile_sriov_printk.h" 21 24 22 25 /** ··· 199 196 struct xe_device *xe = tile_to_xe(tile); 200 197 dma_addr_t offset = xe_bo_main_addr(lmtt->pd->bo, XE_PAGE_SIZE); 201 198 struct xe_gt *gt; 199 + u32 config; 202 200 u8 id; 203 201 204 202 lmtt_debug(lmtt, "DIR offset %pad\n", &offset); 205 203 lmtt_assert(lmtt, xe_bo_is_vram(lmtt->pd->bo)); 206 204 lmtt_assert(lmtt, IS_ALIGNED(offset, SZ_64K)); 207 205 206 + config = LMEM_EN | REG_FIELD_PREP(LMTT_DIR_PTR, offset / SZ_64K); 207 + 208 208 for_each_gt_on_tile(gt, tile, id) 209 209 xe_mmio_write32(&gt->mmio, 210 210 GRAPHICS_VER(xe) >= 20 ? XE2_LMEM_CFG : LMEM_CFG, 211 - LMEM_EN | REG_FIELD_PREP(LMTT_DIR_PTR, offset / SZ_64K)); 211 + config); 212 + 213 + if (xe_device_has_mert(xe) && xe_tile_is_root(tile)) 214 + xe_mmio_write32(&tile->mmio, MERT_LMEM_CFG, config); 212 215 } 213 216 214 217 /** ··· 271 262 * @lmtt: the &xe_lmtt to invalidate 272 263 * 273 264 * Send requests to all GuCs on this tile to invalidate all TLBs. 265 + * If the platform has a standalone MERT, also invalidate MERT's TLB. 274 266 * 275 267 * This function should be called only when running as a PF driver. 276 268 */ 277 269 void xe_lmtt_invalidate_hw(struct xe_lmtt *lmtt) 278 270 { 271 + struct xe_tile *tile = lmtt_to_tile(lmtt); 272 + struct xe_device *xe = lmtt_to_xe(lmtt); 279 273 int err; 280 274 281 - lmtt_assert(lmtt, IS_SRIOV_PF(lmtt_to_xe(lmtt))); 275 + lmtt_assert(lmtt, IS_SRIOV_PF(xe)); 282 276 283 277 err = lmtt_invalidate_hw(lmtt); 284 278 if (err) 285 - xe_tile_sriov_err(lmtt_to_tile(lmtt), "LMTT invalidation failed (%pe)", 279 + xe_tile_sriov_err(tile, "LMTT invalidation failed (%pe)", 286 280 ERR_PTR(err)); 281 + 282 + if (xe_device_has_mert(xe) && xe_tile_is_root(tile)) { 283 + err = xe_mert_invalidate_lmtt(tile); 284 + if (err) 285 + xe_tile_sriov_err(tile, "MERT LMTT invalidation failed (%pe)", 286 + ERR_PTR(err)); 287 + } 287 288 } 288 289 289 290 static void lmtt_write_pte(struct xe_lmtt *lmtt, struct xe_lmtt_pt *pt,
+63 -10
drivers/gpu/drm/xe/xe_lrc.c
··· 44 44 #define LRC_INDIRECT_CTX_BO_SIZE SZ_4K 45 45 #define LRC_INDIRECT_RING_STATE_SIZE SZ_4K 46 46 47 + #define LRC_PRIORITY GENMASK_ULL(10, 9) 48 + #define LRC_PRIORITY_LOW 0 49 + #define LRC_PRIORITY_NORMAL 1 50 + #define LRC_PRIORITY_HIGH 2 51 + 47 52 /* 48 53 * Layout of the LRC and associated data allocated as 49 54 * lrc->bo: ··· 96 91 return false; 97 92 } 98 93 99 - size_t xe_gt_lrc_size(struct xe_gt *gt, enum xe_engine_class class) 94 + /** 95 + * xe_gt_lrc_hang_replay_size() - Hang replay size 96 + * @gt: The GT 97 + * @class: Hardware engine class 98 + * 99 + * Determine size of GPU hang replay state for a GT and hardware engine class. 100 + * 101 + * Return: Size of GPU hang replay size 102 + */ 103 + size_t xe_gt_lrc_hang_replay_size(struct xe_gt *gt, enum xe_engine_class class) 100 104 { 101 105 struct xe_device *xe = gt_to_xe(gt); 102 - size_t size; 103 - 104 - /* Per-process HW status page (PPHWSP) */ 105 - size = LRC_PPHWSP_SIZE; 106 + size_t size = 0; 106 107 107 108 /* Engine context image */ 108 109 switch (class) { ··· 134 123 size += 1 * SZ_4K; 135 124 } 136 125 126 + return size; 127 + } 128 + 129 + size_t xe_gt_lrc_size(struct xe_gt *gt, enum xe_engine_class class) 130 + { 131 + size_t size = xe_gt_lrc_hang_replay_size(gt, class); 132 + 137 133 /* Add indirect ring state page */ 138 134 if (xe_gt_has_indirect_ring_state(gt)) 139 135 size += LRC_INDIRECT_RING_STATE_SIZE; 140 136 141 - return size; 137 + return size + LRC_PPHWSP_SIZE; 142 138 } 143 139 144 140 /* ··· 1404 1386 return 0; 1405 1387 } 1406 1388 1389 + static u8 xe_multi_queue_prio_to_lrc(struct xe_lrc *lrc, enum xe_multi_queue_priority priority) 1390 + { 1391 + struct xe_device *xe = gt_to_xe(lrc->gt); 1392 + 1393 + xe_assert(xe, (priority >= XE_MULTI_QUEUE_PRIORITY_LOW && 1394 + priority <= XE_MULTI_QUEUE_PRIORITY_HIGH)); 1395 + 1396 + /* xe_multi_queue_priority is directly mapped to LRC priority values */ 1397 + return priority; 1398 + } 1399 + 1400 + /** 1401 + * xe_lrc_set_multi_queue_priority() - Set multi queue priority in LRC 1402 + * @lrc: Logical Ring Context 1403 + * @priority: Multi queue priority of the exec queue 1404 + * 1405 + * Convert @priority to LRC multi queue priority and update the @lrc descriptor 1406 + */ 1407 + void xe_lrc_set_multi_queue_priority(struct xe_lrc *lrc, enum xe_multi_queue_priority priority) 1408 + { 1409 + lrc->desc &= ~LRC_PRIORITY; 1410 + lrc->desc |= FIELD_PREP(LRC_PRIORITY, xe_multi_queue_prio_to_lrc(lrc, priority)); 1411 + } 1412 + 1407 1413 static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe, 1408 - struct xe_vm *vm, u32 ring_size, u16 msix_vec, 1414 + struct xe_vm *vm, void *replay_state, u32 ring_size, 1415 + u16 msix_vec, 1409 1416 u32 init_flags) 1410 1417 { 1411 1418 struct xe_gt *gt = hwe->gt; ··· 1445 1402 1446 1403 kref_init(&lrc->refcount); 1447 1404 lrc->gt = gt; 1405 + lrc->replay_size = xe_gt_lrc_hang_replay_size(gt, hwe->class); 1448 1406 lrc->size = lrc_size; 1449 1407 lrc->flags = 0; 1450 1408 lrc->ring.size = ring_size; ··· 1482 1438 * scratch. 1483 1439 */ 1484 1440 map = __xe_lrc_pphwsp_map(lrc); 1485 - if (gt->default_lrc[hwe->class]) { 1441 + if (gt->default_lrc[hwe->class] || replay_state) { 1486 1442 xe_map_memset(xe, &map, 0, 0, LRC_PPHWSP_SIZE); /* PPHWSP */ 1487 1443 xe_map_memcpy_to(xe, &map, LRC_PPHWSP_SIZE, 1488 1444 gt->default_lrc[hwe->class] + LRC_PPHWSP_SIZE, 1489 1445 lrc_size - LRC_PPHWSP_SIZE); 1446 + if (replay_state) 1447 + xe_map_memcpy_to(xe, &map, LRC_PPHWSP_SIZE, 1448 + replay_state, lrc->replay_size); 1490 1449 } else { 1491 1450 void *init_data = empty_lrc_data(hwe); 1492 1451 ··· 1597 1550 * xe_lrc_create - Create a LRC 1598 1551 * @hwe: Hardware Engine 1599 1552 * @vm: The VM (address space) 1553 + * @replay_state: GPU hang replay state 1600 1554 * @ring_size: LRC ring size 1601 1555 * @msix_vec: MSI-X interrupt vector (for platforms that support it) 1602 1556 * @flags: LRC initialization flags ··· 1608 1560 * upon failure. 1609 1561 */ 1610 1562 struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct xe_vm *vm, 1611 - u32 ring_size, u16 msix_vec, u32 flags) 1563 + void *replay_state, u32 ring_size, u16 msix_vec, u32 flags) 1612 1564 { 1613 1565 struct xe_lrc *lrc; 1614 1566 int err; ··· 1617 1569 if (!lrc) 1618 1570 return ERR_PTR(-ENOMEM); 1619 1571 1620 - err = xe_lrc_init(lrc, hwe, vm, ring_size, msix_vec, flags); 1572 + err = xe_lrc_init(lrc, hwe, vm, replay_state, ring_size, msix_vec, flags); 1621 1573 if (err) { 1622 1574 kfree(lrc); 1623 1575 return ERR_PTR(err); ··· 2283 2235 snapshot->lrc_bo = xe_bo_get(lrc->bo); 2284 2236 snapshot->lrc_offset = xe_lrc_pphwsp_offset(lrc); 2285 2237 snapshot->lrc_size = lrc->size; 2238 + snapshot->replay_offset = 0; 2239 + snapshot->replay_size = lrc->replay_size; 2286 2240 snapshot->lrc_snapshot = NULL; 2287 2241 snapshot->ctx_timestamp = lower_32_bits(xe_lrc_ctx_timestamp(lrc)); 2288 2242 snapshot->ctx_job_timestamp = xe_lrc_ctx_job_timestamp(lrc); ··· 2355 2305 } 2356 2306 2357 2307 drm_printf(p, "\n\t[HWCTX].length: 0x%lx\n", snapshot->lrc_size - LRC_PPHWSP_SIZE); 2308 + drm_printf(p, "\n\t[HWCTX].replay_offset: 0x%lx\n", snapshot->replay_offset); 2309 + drm_printf(p, "\n\t[HWCTX].replay_length: 0x%lx\n", snapshot->replay_size); 2310 + 2358 2311 drm_puts(p, "\t[HWCTX].data: "); 2359 2312 for (; i < snapshot->lrc_size; i += sizeof(u32)) { 2360 2313 u32 *val = snapshot->lrc_snapshot + i;
+6 -1
drivers/gpu/drm/xe/xe_lrc.h
··· 13 13 struct xe_bb; 14 14 struct xe_device; 15 15 struct xe_exec_queue; 16 + enum xe_multi_queue_priority; 16 17 enum xe_engine_class; 17 18 struct xe_gt; 18 19 struct xe_hw_engine; ··· 24 23 struct xe_bo *lrc_bo; 25 24 void *lrc_snapshot; 26 25 unsigned long lrc_size, lrc_offset; 26 + unsigned long replay_size, replay_offset; 27 27 28 28 u32 context_desc; 29 29 u32 ring_addr; ··· 51 49 #define XE_LRC_CREATE_USER_CTX BIT(2) 52 50 53 51 struct xe_lrc *xe_lrc_create(struct xe_hw_engine *hwe, struct xe_vm *vm, 54 - u32 ring_size, u16 msix_vec, u32 flags); 52 + void *replay_state, u32 ring_size, u16 msix_vec, u32 flags); 55 53 void xe_lrc_destroy(struct kref *ref); 56 54 57 55 /** ··· 88 86 return SZ_16K; 89 87 } 90 88 89 + size_t xe_gt_lrc_hang_replay_size(struct xe_gt *gt, enum xe_engine_class class); 91 90 size_t xe_gt_lrc_size(struct xe_gt *gt, enum xe_engine_class class); 92 91 u32 xe_lrc_pphwsp_offset(struct xe_lrc *lrc); 93 92 u32 xe_lrc_regs_offset(struct xe_lrc *lrc); ··· 135 132 enum xe_engine_class); 136 133 137 134 u32 *xe_lrc_emit_hwe_state_instructions(struct xe_exec_queue *q, u32 *cs); 135 + 136 + void xe_lrc_set_multi_queue_priority(struct xe_lrc *lrc, enum xe_multi_queue_priority priority); 138 137 139 138 struct xe_lrc_snapshot *xe_lrc_snapshot_capture(struct xe_lrc *lrc); 140 139 void xe_lrc_snapshot_capture_delayed(struct xe_lrc_snapshot *snapshot);
+3
drivers/gpu/drm/xe/xe_lrc_types.h
··· 25 25 /** @size: size of the lrc and optional indirect ring state */ 26 26 u32 size; 27 27 28 + /** @replay_size: Size LRC needed for replaying a hang */ 29 + u32 replay_size; 30 + 28 31 /** @gt: gt which this LRC belongs to */ 29 32 struct xe_gt *gt; 30 33
+82
drivers/gpu/drm/xe/xe_mert.c
··· 1 + // SPDX-License-Identifier: MIT 2 + /* 3 + * Copyright(c) 2025, Intel Corporation. All rights reserved. 4 + */ 5 + 6 + #include "regs/xe_irq_regs.h" 7 + #include "regs/xe_mert_regs.h" 8 + 9 + #include "xe_device.h" 10 + #include "xe_mert.h" 11 + #include "xe_mmio.h" 12 + #include "xe_tile.h" 13 + 14 + /** 15 + * xe_mert_invalidate_lmtt - Invalidate MERT LMTT 16 + * @tile: the &xe_tile 17 + * 18 + * Trigger invalidation of the MERT LMTT and wait for completion. 19 + * 20 + * Return: 0 on success or -ETIMEDOUT in case of a timeout. 21 + */ 22 + int xe_mert_invalidate_lmtt(struct xe_tile *tile) 23 + { 24 + struct xe_device *xe = tile_to_xe(tile); 25 + struct xe_mert *mert = &tile->mert; 26 + const long timeout = HZ / 4; 27 + unsigned long flags; 28 + 29 + xe_assert(xe, xe_device_has_mert(xe)); 30 + xe_assert(xe, xe_tile_is_root(tile)); 31 + 32 + spin_lock_irqsave(&mert->lock, flags); 33 + if (!mert->tlb_inv_triggered) { 34 + mert->tlb_inv_triggered = true; 35 + reinit_completion(&mert->tlb_inv_done); 36 + xe_mmio_write32(&tile->mmio, MERT_TLB_INV_DESC_A, MERT_TLB_INV_DESC_A_VALID); 37 + } 38 + spin_unlock_irqrestore(&mert->lock, flags); 39 + 40 + if (!wait_for_completion_timeout(&mert->tlb_inv_done, timeout)) 41 + return -ETIMEDOUT; 42 + 43 + return 0; 44 + } 45 + 46 + /** 47 + * xe_mert_irq_handler - Handler for MERT interrupts 48 + * @xe: the &xe_device 49 + * @master_ctl: interrupt register 50 + * 51 + * Handle interrupts generated by MERT. 52 + */ 53 + void xe_mert_irq_handler(struct xe_device *xe, u32 master_ctl) 54 + { 55 + struct xe_tile *tile = xe_device_get_root_tile(xe); 56 + unsigned long flags; 57 + u32 reg_val; 58 + u8 err; 59 + 60 + if (!(master_ctl & SOC_H2DMEMINT_IRQ)) 61 + return; 62 + 63 + reg_val = xe_mmio_read32(&tile->mmio, MERT_TLB_CT_INTR_ERR_ID_PORT); 64 + xe_mmio_write32(&tile->mmio, MERT_TLB_CT_INTR_ERR_ID_PORT, 0); 65 + 66 + err = FIELD_GET(MERT_TLB_CT_ERROR_MASK, reg_val); 67 + if (err == MERT_TLB_CT_LMTT_FAULT) 68 + drm_dbg(&xe->drm, "MERT catastrophic error: LMTT fault (VF%u)\n", 69 + FIELD_GET(MERT_TLB_CT_VFID_MASK, reg_val)); 70 + else if (err) 71 + drm_dbg(&xe->drm, "MERT catastrophic error: Unexpected fault (0x%x)\n", err); 72 + 73 + spin_lock_irqsave(&tile->mert.lock, flags); 74 + if (tile->mert.tlb_inv_triggered) { 75 + reg_val = xe_mmio_read32(&tile->mmio, MERT_TLB_INV_DESC_A); 76 + if (!(reg_val & MERT_TLB_INV_DESC_A_VALID)) { 77 + tile->mert.tlb_inv_triggered = false; 78 + complete_all(&tile->mert.tlb_inv_done); 79 + } 80 + } 81 + spin_unlock_irqrestore(&tile->mert.lock, flags); 82 + }
+32
drivers/gpu/drm/xe/xe_mert.h
··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright(c) 2025, Intel Corporation. All rights reserved. 4 + */ 5 + 6 + #ifndef __XE_MERT_H__ 7 + #define __XE_MERT_H__ 8 + 9 + #include <linux/completion.h> 10 + #include <linux/spinlock.h> 11 + #include <linux/types.h> 12 + 13 + struct xe_device; 14 + struct xe_tile; 15 + 16 + struct xe_mert { 17 + /** @lock: protects the TLB invalidation status */ 18 + spinlock_t lock; 19 + /** @tlb_inv_triggered: indicates if TLB invalidation was triggered */ 20 + bool tlb_inv_triggered; 21 + /** @mert.tlb_inv_done: completion of TLB invalidation */ 22 + struct completion tlb_inv_done; 23 + }; 24 + 25 + #ifdef CONFIG_PCI_IOV 26 + int xe_mert_invalidate_lmtt(struct xe_tile *tile); 27 + void xe_mert_irq_handler(struct xe_device *xe, u32 master_ctl); 28 + #else 29 + static inline void xe_mert_irq_handler(struct xe_device *xe, u32 master_ctl) { } 30 + #endif 31 + 32 + #endif /* __XE_MERT_H__ */
+54 -3
drivers/gpu/drm/xe/xe_migrate.c
··· 34 34 #include "xe_res_cursor.h" 35 35 #include "xe_sa.h" 36 36 #include "xe_sched_job.h" 37 + #include "xe_sriov_vf_ccs.h" 37 38 #include "xe_sync.h" 38 39 #include "xe_trace_bo.h" 39 40 #include "xe_validation.h" ··· 1104 1103 u32 batch_size, batch_size_allocated; 1105 1104 struct xe_device *xe = gt_to_xe(gt); 1106 1105 struct xe_res_cursor src_it, ccs_it; 1106 + struct xe_sriov_vf_ccs_ctx *ctx; 1107 + struct xe_sa_manager *bb_pool; 1107 1108 u64 size = xe_bo_size(src_bo); 1108 1109 struct xe_bb *bb = NULL; 1109 1110 u64 src_L0, src_L0_ofs; 1110 1111 u32 src_L0_pt; 1111 1112 int err; 1113 + 1114 + ctx = &xe->sriov.vf.ccs.contexts[read_write]; 1112 1115 1113 1116 xe_res_first_sg(xe_bo_sg(src_bo), 0, size, &src_it); 1114 1117 ··· 1146 1141 size -= src_L0; 1147 1142 } 1148 1143 1144 + bb_pool = ctx->mem.ccs_bb_pool; 1145 + guard(mutex) (xe_sa_bo_swap_guard(bb_pool)); 1146 + xe_sa_bo_swap_shadow(bb_pool); 1147 + 1149 1148 bb = xe_bb_ccs_new(gt, batch_size, read_write); 1150 1149 if (IS_ERR(bb)) { 1151 1150 drm_err(&xe->drm, "BB allocation failed.\n"); 1152 1151 err = PTR_ERR(bb); 1153 - goto err_ret; 1152 + return err; 1154 1153 } 1155 1154 1156 1155 batch_size_allocated = batch_size; ··· 1203 1194 xe_assert(xe, (batch_size_allocated == bb->len)); 1204 1195 src_bo->bb_ccs[read_write] = bb; 1205 1196 1197 + xe_sriov_vf_ccs_rw_update_bb_addr(ctx); 1198 + xe_sa_bo_sync_shadow(bb->bo); 1206 1199 return 0; 1200 + } 1207 1201 1208 - err_ret: 1209 - return err; 1202 + /** 1203 + * xe_migrate_ccs_rw_copy_clear() - Clear the CCS read/write batch buffer 1204 + * content. 1205 + * @src_bo: The buffer object @src is currently bound to. 1206 + * @read_write : Creates BB commands for CCS read/write. 1207 + * 1208 + * Directly clearing the BB lacks atomicity and can lead to undefined 1209 + * behavior if the vCPU is halted mid-operation during the clearing 1210 + * process. To avoid this issue, we use a shadow buffer object approach. 1211 + * 1212 + * First swap the SA BO address with the shadow BO, perform the clearing 1213 + * operation on the BB, update the shadow BO in the ring buffer, then 1214 + * sync the shadow and the actual buffer to maintain consistency. 1215 + * 1216 + * Returns: None. 1217 + */ 1218 + void xe_migrate_ccs_rw_copy_clear(struct xe_bo *src_bo, 1219 + enum xe_sriov_vf_ccs_rw_ctxs read_write) 1220 + { 1221 + struct xe_bb *bb = src_bo->bb_ccs[read_write]; 1222 + struct xe_device *xe = xe_bo_device(src_bo); 1223 + struct xe_sriov_vf_ccs_ctx *ctx; 1224 + struct xe_sa_manager *bb_pool; 1225 + u32 *cs; 1226 + 1227 + xe_assert(xe, IS_SRIOV_VF(xe)); 1228 + 1229 + ctx = &xe->sriov.vf.ccs.contexts[read_write]; 1230 + bb_pool = ctx->mem.ccs_bb_pool; 1231 + 1232 + guard(mutex) (xe_sa_bo_swap_guard(bb_pool)); 1233 + xe_sa_bo_swap_shadow(bb_pool); 1234 + 1235 + cs = xe_sa_bo_cpu_addr(bb->bo); 1236 + memset(cs, MI_NOOP, bb->len * sizeof(u32)); 1237 + xe_sriov_vf_ccs_rw_update_bb_addr(ctx); 1238 + 1239 + xe_sa_bo_sync_shadow(bb->bo); 1240 + 1241 + xe_bb_free(bb, NULL); 1242 + src_bo->bb_ccs[read_write] = NULL; 1210 1243 } 1211 1244 1212 1245 /**
+3
drivers/gpu/drm/xe/xe_migrate.h
··· 134 134 struct xe_bo *src_bo, 135 135 enum xe_sriov_vf_ccs_rw_ctxs read_write); 136 136 137 + void xe_migrate_ccs_rw_copy_clear(struct xe_bo *src_bo, 138 + enum xe_sriov_vf_ccs_rw_ctxs read_write); 139 + 137 140 struct xe_lrc *xe_migrate_lrc(struct xe_migrate *migrate); 138 141 struct xe_exec_queue *xe_migrate_exec_queue(struct xe_migrate *migrate); 139 142 struct dma_fence *xe_migrate_vram_copy_chunk(struct xe_bo *vram_bo, u64 vram_offset,
+6 -12
drivers/gpu/drm/xe/xe_mocs.c
··· 811 811 struct xe_device *xe = gt_to_xe(gt); 812 812 enum xe_force_wake_domains domain; 813 813 struct xe_mocs_info table; 814 - unsigned int fw_ref, flags; 815 - int err = 0; 814 + unsigned int flags; 816 815 817 816 flags = get_mocs_settings(xe, &table); 818 817 819 818 domain = flags & HAS_LNCF_MOCS ? XE_FORCEWAKE_ALL : XE_FW_GT; 820 - xe_pm_runtime_get_noresume(xe); 821 - fw_ref = xe_force_wake_get(gt_to_fw(gt), domain); 822 819 823 - if (!xe_force_wake_ref_has_domain(fw_ref, domain)) { 824 - err = -ETIMEDOUT; 825 - goto err_fw; 826 - } 820 + guard(xe_pm_runtime_noresume)(xe); 821 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), domain); 822 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, domain)) 823 + return -ETIMEDOUT; 827 824 828 825 table.ops->dump(&table, flags, gt, p); 829 826 830 - err_fw: 831 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 832 - xe_pm_runtime_put(xe); 833 - return err; 827 + return 0; 834 828 } 835 829 836 830 #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
+23 -11
drivers/gpu/drm/xe/xe_nvm.c
··· 10 10 #include "xe_device_types.h" 11 11 #include "xe_mmio.h" 12 12 #include "xe_nvm.h" 13 + #include "xe_pcode_api.h" 13 14 #include "regs/xe_gsc_regs.h" 14 15 #include "xe_sriov.h" 15 16 ··· 46 45 { 47 46 struct xe_mmio *mmio = xe_root_tile_mmio(xe); 48 47 49 - if (xe->info.platform != XE_BATTLEMAGE) 48 + switch (xe->info.platform) { 49 + case XE_CRESCENTISLAND: 50 + case XE_BATTLEMAGE: 51 + return !(xe_mmio_read32(mmio, XE_REG(GEN12_CNTL_PROTECTED_NVM_REG)) & 52 + NVM_NON_POSTED_ERASE_CHICKEN_BIT); 53 + default: 50 54 return false; 51 - return !(xe_mmio_read32(mmio, XE_REG(GEN12_CNTL_PROTECTED_NVM_REG)) & 52 - NVM_NON_POSTED_ERASE_CHICKEN_BIT); 55 + } 53 56 } 54 57 55 58 static bool xe_nvm_writable_override(struct xe_device *xe) 56 59 { 57 60 struct xe_mmio *mmio = xe_root_tile_mmio(xe); 58 61 bool writable_override; 59 - resource_size_t base; 62 + struct xe_reg reg; 63 + u32 test_bit; 60 64 61 65 switch (xe->info.platform) { 66 + case XE_CRESCENTISLAND: 67 + reg = PCODE_SCRATCH(0); 68 + test_bit = FDO_MODE; 69 + break; 62 70 case XE_BATTLEMAGE: 63 - base = DG2_GSC_HECI2_BASE; 71 + reg = HECI_FWSTS2(DG2_GSC_HECI2_BASE); 72 + test_bit = HECI_FW_STATUS_2_NVM_ACCESS_MODE; 64 73 break; 65 74 case XE_PVC: 66 - base = PVC_GSC_HECI2_BASE; 75 + reg = HECI_FWSTS2(PVC_GSC_HECI2_BASE); 76 + test_bit = HECI_FW_STATUS_2_NVM_ACCESS_MODE; 67 77 break; 68 78 case XE_DG2: 69 - base = DG2_GSC_HECI2_BASE; 79 + reg = HECI_FWSTS2(DG2_GSC_HECI2_BASE); 80 + test_bit = HECI_FW_STATUS_2_NVM_ACCESS_MODE; 70 81 break; 71 82 case XE_DG1: 72 - base = DG1_GSC_HECI2_BASE; 83 + reg = HECI_FWSTS2(DG1_GSC_HECI2_BASE); 84 + test_bit = HECI_FW_STATUS_2_NVM_ACCESS_MODE; 73 85 break; 74 86 default: 75 87 drm_err(&xe->drm, "Unknown platform\n"); 76 88 return true; 77 89 } 78 90 79 - writable_override = 80 - !(xe_mmio_read32(mmio, HECI_FWSTS2(base)) & 81 - HECI_FW_STATUS_2_NVM_ACCESS_MODE); 91 + writable_override = !(xe_mmio_read32(mmio, reg) & test_bit); 82 92 if (writable_override) 83 93 drm_info(&xe->drm, "NVM access overridden by jumper\n"); 84 94 return writable_override;
+65 -29
drivers/gpu/drm/xe/xe_oa.c
··· 1941 1941 type == DRM_XE_OA_FMT_TYPE_OAC || type == DRM_XE_OA_FMT_TYPE_PEC; 1942 1942 case DRM_XE_OA_UNIT_TYPE_OAM: 1943 1943 case DRM_XE_OA_UNIT_TYPE_OAM_SAG: 1944 + case DRM_XE_OA_UNIT_TYPE_MERT: 1944 1945 return type == DRM_XE_OA_FMT_TYPE_OAM || type == DRM_XE_OA_FMT_TYPE_OAM_MPEC; 1945 1946 default: 1946 1947 return false; ··· 1966 1965 struct xe_hw_engine *hwe; 1967 1966 enum xe_hw_engine_id id; 1968 1967 int ret = 0; 1969 - 1970 - /* If not provided, OA unit defaults to OA unit 0 as per uapi */ 1971 - if (!param->oa_unit) 1972 - param->oa_unit = &xe_root_mmio_gt(oa->xe)->oa.oa_unit[0]; 1973 1968 1974 1969 /* When we have an exec_q, get hwe from the exec_q */ 1975 1970 if (param->exec_q) { ··· 2032 2035 if (ret) 2033 2036 return ret; 2034 2037 2038 + /* If not provided, OA unit defaults to OA unit 0 as per uapi */ 2039 + if (!param.oa_unit) 2040 + param.oa_unit = &xe_root_mmio_gt(oa->xe)->oa.oa_unit[0]; 2041 + 2035 2042 if (param.exec_queue_id > 0) { 2043 + /* An exec_queue is only needed for OAR/OAC functionality on OAG */ 2044 + if (XE_IOCTL_DBG(oa->xe, param.oa_unit->type != DRM_XE_OA_UNIT_TYPE_OAG)) 2045 + return -EINVAL; 2046 + 2036 2047 param.exec_q = xe_exec_queue_lookup(xef, param.exec_queue_id); 2037 2048 if (XE_IOCTL_DBG(oa->xe, !param.exec_q)) 2038 2049 return -ENOENT; ··· 2229 2224 { .start = 0xE18C, .end = 0xE18C }, /* SAMPLER_MODE */ 2230 2225 { .start = 0xE590, .end = 0xE590 }, /* TDL_LSC_LAT_MEASURE_TDL_GFX */ 2231 2226 { .start = 0x13000, .end = 0x137FC }, /* PES_0_PESL0 - PES_63_UPPER_PESL3 */ 2227 + { .start = 0x145194, .end = 0x145194 }, /* SYS_MEM_LAT_MEASURE */ 2228 + { .start = 0x145340, .end = 0x14537C }, /* MERTSS_PES_0 - MERTSS_PES_7 */ 2232 2229 {}, 2233 2230 }; 2234 2231 ··· 2522 2515 static u32 num_oa_units_per_gt(struct xe_gt *gt) 2523 2516 { 2524 2517 if (xe_gt_is_main_type(gt) || GRAPHICS_VER(gt_to_xe(gt)) < 20) 2525 - return 1; 2518 + /* 2519 + * Mert OA unit belongs to the SoC, not a gt, so should be accessed using 2520 + * xe_root_tile_mmio(). However, for all known platforms this is the same as 2521 + * accessing via xe_root_mmio_gt()->mmio. 2522 + */ 2523 + return xe_device_has_mert(gt_to_xe(gt)) ? 2 : 1; 2526 2524 else if (!IS_DGFX(gt_to_xe(gt))) 2527 2525 return XE_OAM_UNIT_SCMI_0 + 1; /* SAG + SCMI_0 */ 2528 2526 else ··· 2582 2570 static struct xe_oa_regs __oam_regs(u32 base) 2583 2571 { 2584 2572 return (struct xe_oa_regs) { 2585 - base, 2586 - OAM_HEAD_POINTER(base), 2587 - OAM_TAIL_POINTER(base), 2588 - OAM_BUFFER(base), 2589 - OAM_CONTEXT_CONTROL(base), 2590 - OAM_CONTROL(base), 2591 - OAM_DEBUG(base), 2592 - OAM_STATUS(base), 2593 - OAM_CONTROL_COUNTER_SEL_MASK, 2573 + .base = base, 2574 + .oa_head_ptr = OAM_HEAD_POINTER(base), 2575 + .oa_tail_ptr = OAM_TAIL_POINTER(base), 2576 + .oa_buffer = OAM_BUFFER(base), 2577 + .oa_ctx_ctrl = OAM_CONTEXT_CONTROL(base), 2578 + .oa_ctrl = OAM_CONTROL(base), 2579 + .oa_debug = OAM_DEBUG(base), 2580 + .oa_status = OAM_STATUS(base), 2581 + .oa_mmio_trg = OAM_MMIO_TRG(base), 2582 + .oa_ctrl_counter_select_mask = OAM_CONTROL_COUNTER_SEL_MASK, 2594 2583 }; 2595 2584 } 2596 2585 2597 2586 static struct xe_oa_regs __oag_regs(void) 2598 2587 { 2599 2588 return (struct xe_oa_regs) { 2600 - 0, 2601 - OAG_OAHEADPTR, 2602 - OAG_OATAILPTR, 2603 - OAG_OABUFFER, 2604 - OAG_OAGLBCTXCTRL, 2605 - OAG_OACONTROL, 2606 - OAG_OA_DEBUG, 2607 - OAG_OASTATUS, 2608 - OAG_OACONTROL_OA_COUNTER_SEL_MASK, 2589 + .base = 0, 2590 + .oa_head_ptr = OAG_OAHEADPTR, 2591 + .oa_tail_ptr = OAG_OATAILPTR, 2592 + .oa_buffer = OAG_OABUFFER, 2593 + .oa_ctx_ctrl = OAG_OAGLBCTXCTRL, 2594 + .oa_ctrl = OAG_OACONTROL, 2595 + .oa_debug = OAG_OA_DEBUG, 2596 + .oa_status = OAG_OASTATUS, 2597 + .oa_mmio_trg = OAG_MMIOTRIGGER, 2598 + .oa_ctrl_counter_select_mask = OAG_OACONTROL_OA_COUNTER_SEL_MASK, 2599 + }; 2600 + } 2601 + 2602 + static struct xe_oa_regs __oamert_regs(void) 2603 + { 2604 + return (struct xe_oa_regs) { 2605 + .base = 0, 2606 + .oa_head_ptr = OAMERT_HEAD_POINTER, 2607 + .oa_tail_ptr = OAMERT_TAIL_POINTER, 2608 + .oa_buffer = OAMERT_BUFFER, 2609 + .oa_ctx_ctrl = OAMERT_CONTEXT_CONTROL, 2610 + .oa_ctrl = OAMERT_CONTROL, 2611 + .oa_debug = OAMERT_DEBUG, 2612 + .oa_status = OAMERT_STATUS, 2613 + .oa_mmio_trg = OAMERT_MMIO_TRG, 2614 + .oa_ctrl_counter_select_mask = OAM_CONTROL_COUNTER_SEL_MASK, 2609 2615 }; 2610 2616 } 2611 2617 2612 2618 static void __xe_oa_init_oa_units(struct xe_gt *gt) 2613 2619 { 2614 - /* Actual address is MEDIA_GT_GSI_OFFSET + oam_base_addr[i] */ 2615 2620 const u32 oam_base_addr[] = { 2616 - [XE_OAM_UNIT_SAG] = 0x13000, 2617 - [XE_OAM_UNIT_SCMI_0] = 0x14000, 2618 - [XE_OAM_UNIT_SCMI_1] = 0x14800, 2621 + [XE_OAM_UNIT_SAG] = XE_OAM_SAG_BASE, 2622 + [XE_OAM_UNIT_SCMI_0] = XE_OAM_SCMI_0_BASE, 2623 + [XE_OAM_UNIT_SCMI_1] = XE_OAM_SCMI_1_BASE, 2619 2624 }; 2620 2625 int i, num_units = gt->oa.num_oa_units; 2621 2626 ··· 2640 2611 struct xe_oa_unit *u = &gt->oa.oa_unit[i]; 2641 2612 2642 2613 if (xe_gt_is_main_type(gt)) { 2643 - u->regs = __oag_regs(); 2644 - u->type = DRM_XE_OA_UNIT_TYPE_OAG; 2614 + if (!i) { 2615 + u->regs = __oag_regs(); 2616 + u->type = DRM_XE_OA_UNIT_TYPE_OAG; 2617 + } else { 2618 + xe_gt_assert(gt, xe_device_has_mert(gt_to_xe(gt))); 2619 + xe_gt_assert(gt, gt == xe_root_mmio_gt(gt_to_xe(gt))); 2620 + u->regs = __oamert_regs(); 2621 + u->type = DRM_XE_OA_UNIT_TYPE_MERT; 2622 + } 2645 2623 } else { 2646 2624 xe_gt_assert(gt, GRAPHICS_VERx100(gt_to_xe(gt)) >= 1270); 2647 2625 u->regs = __oam_regs(oam_base_addr[i]);
+1
drivers/gpu/drm/xe/xe_oa_types.h
··· 87 87 struct xe_reg oa_ctrl; 88 88 struct xe_reg oa_debug; 89 89 struct xe_reg oa_status; 90 + struct xe_reg oa_mmio_trg; 90 91 u32 oa_ctrl_counter_select_mask; 91 92 }; 92 93
+136
drivers/gpu/drm/xe/xe_page_reclaim.c
··· 1 + // SPDX-License-Identifier: MIT 2 + /* 3 + * Copyright © 2025 Intel Corporation 4 + */ 5 + 6 + #include <linux/bitfield.h> 7 + #include <linux/kref.h> 8 + #include <linux/mm.h> 9 + #include <linux/slab.h> 10 + 11 + #include "xe_page_reclaim.h" 12 + 13 + #include "regs/xe_gt_regs.h" 14 + #include "xe_assert.h" 15 + #include "xe_macros.h" 16 + #include "xe_mmio.h" 17 + #include "xe_pat.h" 18 + #include "xe_sa.h" 19 + #include "xe_tlb_inval_types.h" 20 + #include "xe_vm.h" 21 + 22 + /** 23 + * xe_page_reclaim_skip() - Decide whether PRL should be skipped for a VMA 24 + * @tile: Tile owning the VMA 25 + * @vma: VMA under consideration 26 + * 27 + * PPC flushing may be handled by HW for specific PAT encodings. 28 + * Skip PPC flushing/Page Reclaim for scenarios below due to redundant 29 + * flushes. 30 + * - pat_index is transient display (1) 31 + * 32 + * Return: true when page reclamation is unnecessary, false otherwise. 33 + */ 34 + bool xe_page_reclaim_skip(struct xe_tile *tile, struct xe_vma *vma) 35 + { 36 + u8 l3_policy; 37 + 38 + l3_policy = xe_pat_index_get_l3_policy(tile->xe, vma->attr.pat_index); 39 + 40 + /* 41 + * - l3_policy: 0=WB, 1=XD ("WB - Transient Display"), 3=UC 42 + * Transient display flushes is taken care by HW, l3_policy = 1. 43 + * 44 + * HW will sequence these transient flushes at various sync points so 45 + * any event of page reclamation will hit these sync points before 46 + * page reclamation could execute. 47 + */ 48 + return (l3_policy == XE_L3_POLICY_XD); 49 + } 50 + 51 + /** 52 + * xe_page_reclaim_create_prl_bo() - Back a PRL with a suballocated GGTT BO 53 + * @tlb_inval: TLB invalidation frontend associated with the request 54 + * @prl: page reclaim list data that bo will copy from 55 + * @fence: tlb invalidation fence that page reclaim action is paired to 56 + * 57 + * Suballocates a 4K BO out of the tile reclaim pool, copies the PRL CPU 58 + * copy into the BO and queues the buffer for release when @fence signals. 59 + * 60 + * Return: struct drm_suballoc pointer on success or ERR_PTR on failure. 61 + */ 62 + struct drm_suballoc *xe_page_reclaim_create_prl_bo(struct xe_tlb_inval *tlb_inval, 63 + struct xe_page_reclaim_list *prl, 64 + struct xe_tlb_inval_fence *fence) 65 + { 66 + struct xe_gt *gt = container_of(tlb_inval, struct xe_gt, tlb_inval); 67 + struct xe_tile *tile = gt_to_tile(gt); 68 + /* (+1) for NULL page_reclaim_entry to indicate end of list */ 69 + int prl_size = min(prl->num_entries + 1, XE_PAGE_RECLAIM_MAX_ENTRIES) * 70 + sizeof(struct xe_guc_page_reclaim_entry); 71 + struct drm_suballoc *prl_sa; 72 + 73 + /* Maximum size of PRL is 1 4K-page */ 74 + prl_sa = __xe_sa_bo_new(tile->mem.reclaim_pool, 75 + prl_size, GFP_ATOMIC); 76 + if (IS_ERR(prl_sa)) 77 + return prl_sa; 78 + 79 + memcpy(xe_sa_bo_cpu_addr(prl_sa), prl->entries, 80 + prl_size); 81 + xe_sa_bo_flush_write(prl_sa); 82 + /* Queue up sa_bo_free on tlb invalidation fence signal */ 83 + xe_sa_bo_free(prl_sa, &fence->base); 84 + 85 + return prl_sa; 86 + } 87 + 88 + /** 89 + * xe_page_reclaim_list_invalidate() - Mark a PRL as invalid 90 + * @prl: Page reclaim list to reset 91 + * 92 + * Clears the entries pointer and marks the list as invalid so 93 + * future use knows PRL is unusable. It is expected that the entries 94 + * have already been released. 95 + */ 96 + void xe_page_reclaim_list_invalidate(struct xe_page_reclaim_list *prl) 97 + { 98 + xe_page_reclaim_entries_put(prl->entries); 99 + prl->entries = NULL; 100 + prl->num_entries = XE_PAGE_RECLAIM_INVALID_LIST; 101 + } 102 + 103 + /** 104 + * xe_page_reclaim_list_init() - Initialize a page reclaim list 105 + * @prl: Page reclaim list to initialize 106 + * 107 + * NULLs both values in list to prepare on initalization. 108 + */ 109 + void xe_page_reclaim_list_init(struct xe_page_reclaim_list *prl) 110 + { 111 + // xe_page_reclaim_list_invalidate(prl); 112 + prl->entries = NULL; 113 + prl->num_entries = 0; 114 + } 115 + 116 + /** 117 + * xe_page_reclaim_list_alloc_entries() - Allocate page reclaim list entries 118 + * @prl: Page reclaim list to allocate entries for 119 + * 120 + * Allocate one 4K page for the PRL entries, otherwise assign prl->entries to NULL. 121 + */ 122 + int xe_page_reclaim_list_alloc_entries(struct xe_page_reclaim_list *prl) 123 + { 124 + struct page *page; 125 + 126 + if (XE_WARN_ON(prl->entries)) 127 + return 0; 128 + 129 + page = alloc_page(GFP_KERNEL | __GFP_ZERO); 130 + if (page) { 131 + prl->entries = page_address(page); 132 + prl->num_entries = 0; 133 + } 134 + 135 + return page ? 0 : -ENOMEM; 136 + }
+105
drivers/gpu/drm/xe/xe_page_reclaim.h
··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2025 Intel Corporation 4 + */ 5 + 6 + #ifndef _XE_PAGE_RECLAIM_H_ 7 + #define _XE_PAGE_RECLAIM_H_ 8 + 9 + #include <linux/kref.h> 10 + #include <linux/mm.h> 11 + #include <linux/slab.h> 12 + #include <linux/types.h> 13 + #include <linux/workqueue.h> 14 + #include <linux/bits.h> 15 + 16 + #define XE_PAGE_RECLAIM_MAX_ENTRIES 512 17 + #define XE_PAGE_RECLAIM_LIST_MAX_SIZE SZ_4K 18 + 19 + struct xe_tlb_inval; 20 + struct xe_tlb_inval_fence; 21 + struct xe_tile; 22 + struct xe_vma; 23 + 24 + struct xe_guc_page_reclaim_entry { 25 + u64 qw; 26 + /* valid reclaim entry bit */ 27 + #define XE_PAGE_RECLAIM_VALID BIT_ULL(0) 28 + /* 29 + * offset order of page size to be reclaimed 30 + * page_size = 1 << (XE_PTE_SHIFT + reclamation_size) 31 + */ 32 + #define XE_PAGE_RECLAIM_SIZE GENMASK_ULL(6, 1) 33 + #define XE_PAGE_RECLAIM_RSVD_0 GENMASK_ULL(11, 7) 34 + /* lower 20 bits of the physical address */ 35 + #define XE_PAGE_RECLAIM_ADDR_LO GENMASK_ULL(31, 12) 36 + /* upper 20 bits of the physical address */ 37 + #define XE_PAGE_RECLAIM_ADDR_HI GENMASK_ULL(51, 32) 38 + #define XE_PAGE_RECLAIM_RSVD_1 GENMASK_ULL(63, 52) 39 + } __packed; 40 + 41 + struct xe_page_reclaim_list { 42 + /** @entries: array of page reclaim entries, page allocated */ 43 + struct xe_guc_page_reclaim_entry *entries; 44 + /** @num_entries: number of entries */ 45 + int num_entries; 46 + #define XE_PAGE_RECLAIM_INVALID_LIST -1 47 + }; 48 + 49 + /** 50 + * xe_page_reclaim_list_is_new() - Check if PRL is new allocation 51 + * @prl: Pointer to page reclaim list 52 + * 53 + * PRL indicates it hasn't been allocated through both values being NULL 54 + */ 55 + static inline bool xe_page_reclaim_list_is_new(struct xe_page_reclaim_list *prl) 56 + { 57 + return !prl->entries && prl->num_entries == 0; 58 + } 59 + 60 + /** 61 + * xe_page_reclaim_list_valid() - Check if the page reclaim list is valid 62 + * @prl: Pointer to page reclaim list 63 + * 64 + * PRL uses the XE_PAGE_RECLAIM_INVALID_LIST to indicate that a PRL 65 + * is unusable. 66 + */ 67 + static inline bool xe_page_reclaim_list_valid(struct xe_page_reclaim_list *prl) 68 + { 69 + return !xe_page_reclaim_list_is_new(prl) && 70 + prl->num_entries != XE_PAGE_RECLAIM_INVALID_LIST; 71 + } 72 + 73 + bool xe_page_reclaim_skip(struct xe_tile *tile, struct xe_vma *vma); 74 + struct drm_suballoc *xe_page_reclaim_create_prl_bo(struct xe_tlb_inval *tlb_inval, 75 + struct xe_page_reclaim_list *prl, 76 + struct xe_tlb_inval_fence *fence); 77 + void xe_page_reclaim_list_invalidate(struct xe_page_reclaim_list *prl); 78 + void xe_page_reclaim_list_init(struct xe_page_reclaim_list *prl); 79 + int xe_page_reclaim_list_alloc_entries(struct xe_page_reclaim_list *prl); 80 + /** 81 + * xe_page_reclaim_entries_get() - Increment the reference count of page reclaim entries. 82 + * @entries: Pointer to the array of page reclaim entries. 83 + * 84 + * This function increments the reference count of the backing page. 85 + */ 86 + static inline void xe_page_reclaim_entries_get(struct xe_guc_page_reclaim_entry *entries) 87 + { 88 + if (entries) 89 + get_page(virt_to_page(entries)); 90 + } 91 + 92 + /** 93 + * xe_page_reclaim_entries_put() - Decrement the reference count of page reclaim entries. 94 + * @entries: Pointer to the array of page reclaim entries. 95 + * 96 + * This function decrements the reference count of the backing page 97 + * and frees it if the count reaches zero. 98 + */ 99 + static inline void xe_page_reclaim_entries_put(struct xe_guc_page_reclaim_entry *entries) 100 + { 101 + if (entries) 102 + put_page(virt_to_page(entries)); 103 + } 104 + 105 + #endif /* _XE_PAGE_RECLAIM_H_ */
+18 -18
drivers/gpu/drm/xe/xe_pagefault.c
··· 223 223 224 224 static void xe_pagefault_print(struct xe_pagefault *pf) 225 225 { 226 - xe_gt_dbg(pf->gt, "\n\tASID: %d\n" 227 - "\tFaulted Address: 0x%08x%08x\n" 228 - "\tFaultType: %d\n" 229 - "\tAccessType: %d\n" 230 - "\tFaultLevel: %d\n" 231 - "\tEngineClass: %d %s\n" 232 - "\tEngineInstance: %d\n", 233 - pf->consumer.asid, 234 - upper_32_bits(pf->consumer.page_addr), 235 - lower_32_bits(pf->consumer.page_addr), 236 - pf->consumer.fault_type, 237 - pf->consumer.access_type, 238 - pf->consumer.fault_level, 239 - pf->consumer.engine_class, 240 - xe_hw_engine_class_to_str(pf->consumer.engine_class), 241 - pf->consumer.engine_instance); 226 + xe_gt_info(pf->gt, "\n\tASID: %d\n" 227 + "\tFaulted Address: 0x%08x%08x\n" 228 + "\tFaultType: %d\n" 229 + "\tAccessType: %d\n" 230 + "\tFaultLevel: %d\n" 231 + "\tEngineClass: %d %s\n" 232 + "\tEngineInstance: %d\n", 233 + pf->consumer.asid, 234 + upper_32_bits(pf->consumer.page_addr), 235 + lower_32_bits(pf->consumer.page_addr), 236 + pf->consumer.fault_type, 237 + pf->consumer.access_type, 238 + pf->consumer.fault_level, 239 + pf->consumer.engine_class, 240 + xe_hw_engine_class_to_str(pf->consumer.engine_class), 241 + pf->consumer.engine_instance); 242 242 } 243 243 244 244 static void xe_pagefault_queue_work(struct work_struct *w) ··· 260 260 err = xe_pagefault_service(&pf); 261 261 if (err) { 262 262 xe_pagefault_print(&pf); 263 - xe_gt_dbg(pf.gt, "Fault response: Unsuccessful %pe\n", 264 - ERR_PTR(err)); 263 + xe_gt_info(pf.gt, "Fault response: Unsuccessful %pe\n", 264 + ERR_PTR(err)); 265 265 } 266 266 267 267 pf.producer.ops->ack_fault(&pf, err);
+152 -68
drivers/gpu/drm/xe/xe_pat.c
··· 9 9 10 10 #include <generated/xe_wa_oob.h> 11 11 12 + #include "regs/xe_gt_regs.h" 12 13 #include "regs/xe_reg_defs.h" 13 14 #include "xe_assert.h" 14 15 #include "xe_device.h" ··· 51 50 #define XELP_PAT_WC REG_FIELD_PREP(XELP_MEM_TYPE_MASK, 1) 52 51 #define XELP_PAT_UC REG_FIELD_PREP(XELP_MEM_TYPE_MASK, 0) 53 52 53 + #define PAT_LABEL_LEN 20 54 + 54 55 static const char *XELP_MEM_TYPE_STR_MAP[] = { "UC", "WC", "WT", "WB" }; 56 + 57 + static void xe_pat_index_label(char *label, size_t len, int index) 58 + { 59 + snprintf(label, len, "PAT[%2d] ", index); 60 + } 61 + 62 + static void xelp_pat_entry_dump(struct drm_printer *p, int index, u32 pat) 63 + { 64 + u8 mem_type = REG_FIELD_GET(XELP_MEM_TYPE_MASK, pat); 65 + 66 + drm_printf(p, "PAT[%2d] = %s (%#8x)\n", index, 67 + XELP_MEM_TYPE_STR_MAP[mem_type], pat); 68 + } 69 + 70 + static void xehpc_pat_entry_dump(struct drm_printer *p, int index, u32 pat) 71 + { 72 + drm_printf(p, "PAT[%2d] = [ %u, %u ] (%#8x)\n", index, 73 + REG_FIELD_GET(XELP_MEM_TYPE_MASK, pat), 74 + REG_FIELD_GET(XEHPC_CLOS_LEVEL_MASK, pat), pat); 75 + } 76 + 77 + static void xelpg_pat_entry_dump(struct drm_printer *p, int index, u32 pat) 78 + { 79 + drm_printf(p, "PAT[%2d] = [ %u, %u ] (%#8x)\n", index, 80 + REG_FIELD_GET(XELPG_L4_POLICY_MASK, pat), 81 + REG_FIELD_GET(XELPG_INDEX_COH_MODE_MASK, pat), pat); 82 + } 55 83 56 84 struct xe_pat_ops { 57 85 void (*program_graphics)(struct xe_gt *gt, const struct xe_pat_table_entry table[], ··· 226 196 return xe->pat.table[pat_index].coh_mode; 227 197 } 228 198 199 + bool xe_pat_index_get_comp_en(struct xe_device *xe, u16 pat_index) 200 + { 201 + WARN_ON(pat_index >= xe->pat.n_entries); 202 + return !!(xe->pat.table[pat_index].value & XE2_COMP_EN); 203 + } 204 + 205 + u16 xe_pat_index_get_l3_policy(struct xe_device *xe, u16 pat_index) 206 + { 207 + WARN_ON(pat_index >= xe->pat.n_entries); 208 + 209 + return REG_FIELD_GET(XE2_L3_POLICY, xe->pat.table[pat_index].value); 210 + } 211 + 229 212 static void program_pat(struct xe_gt *gt, const struct xe_pat_table_entry table[], 230 213 int n_entries) 231 214 { ··· 276 233 static int xelp_dump(struct xe_gt *gt, struct drm_printer *p) 277 234 { 278 235 struct xe_device *xe = gt_to_xe(gt); 279 - unsigned int fw_ref; 280 236 int i; 281 237 282 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 283 - if (!fw_ref) 238 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 239 + if (!fw_ref.domains) 284 240 return -ETIMEDOUT; 285 241 286 242 drm_printf(p, "PAT table:\n"); 287 243 288 244 for (i = 0; i < xe->pat.n_entries; i++) { 289 245 u32 pat = xe_mmio_read32(&gt->mmio, XE_REG(_PAT_INDEX(i))); 290 - u8 mem_type = REG_FIELD_GET(XELP_MEM_TYPE_MASK, pat); 291 246 292 - drm_printf(p, "PAT[%2d] = %s (%#8x)\n", i, 293 - XELP_MEM_TYPE_STR_MAP[mem_type], pat); 247 + xelp_pat_entry_dump(p, i, pat); 294 248 } 295 249 296 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 297 250 return 0; 298 251 } 299 252 ··· 301 262 static int xehp_dump(struct xe_gt *gt, struct drm_printer *p) 302 263 { 303 264 struct xe_device *xe = gt_to_xe(gt); 304 - unsigned int fw_ref; 305 265 int i; 306 266 307 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 308 - if (!fw_ref) 267 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 268 + if (!fw_ref.domains) 309 269 return -ETIMEDOUT; 310 270 311 271 drm_printf(p, "PAT table:\n"); 312 272 313 273 for (i = 0; i < xe->pat.n_entries; i++) { 314 274 u32 pat = xe_gt_mcr_unicast_read_any(gt, XE_REG_MCR(_PAT_INDEX(i))); 315 - u8 mem_type; 316 275 317 - mem_type = REG_FIELD_GET(XELP_MEM_TYPE_MASK, pat); 318 - 319 - drm_printf(p, "PAT[%2d] = %s (%#8x)\n", i, 320 - XELP_MEM_TYPE_STR_MAP[mem_type], pat); 276 + xelp_pat_entry_dump(p, i, pat); 321 277 } 322 278 323 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 324 279 return 0; 325 280 } 326 281 ··· 326 293 static int xehpc_dump(struct xe_gt *gt, struct drm_printer *p) 327 294 { 328 295 struct xe_device *xe = gt_to_xe(gt); 329 - unsigned int fw_ref; 330 296 int i; 331 297 332 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 333 - if (!fw_ref) 298 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 299 + if (!fw_ref.domains) 334 300 return -ETIMEDOUT; 335 301 336 302 drm_printf(p, "PAT table:\n"); ··· 337 305 for (i = 0; i < xe->pat.n_entries; i++) { 338 306 u32 pat = xe_gt_mcr_unicast_read_any(gt, XE_REG_MCR(_PAT_INDEX(i))); 339 307 340 - drm_printf(p, "PAT[%2d] = [ %u, %u ] (%#8x)\n", i, 341 - REG_FIELD_GET(XELP_MEM_TYPE_MASK, pat), 342 - REG_FIELD_GET(XEHPC_CLOS_LEVEL_MASK, pat), pat); 308 + xehpc_pat_entry_dump(p, i, pat); 343 309 } 344 310 345 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 346 311 return 0; 347 312 } 348 313 ··· 351 322 static int xelpg_dump(struct xe_gt *gt, struct drm_printer *p) 352 323 { 353 324 struct xe_device *xe = gt_to_xe(gt); 354 - unsigned int fw_ref; 355 325 int i; 356 326 357 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 358 - if (!fw_ref) 327 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 328 + if (!fw_ref.domains) 359 329 return -ETIMEDOUT; 360 330 361 331 drm_printf(p, "PAT table:\n"); ··· 367 339 else 368 340 pat = xe_gt_mcr_unicast_read_any(gt, XE_REG_MCR(_PAT_INDEX(i))); 369 341 370 - drm_printf(p, "PAT[%2d] = [ %u, %u ] (%#8x)\n", i, 371 - REG_FIELD_GET(XELPG_L4_POLICY_MASK, pat), 372 - REG_FIELD_GET(XELPG_INDEX_COH_MODE_MASK, pat), pat); 342 + xelpg_pat_entry_dump(p, i, pat); 373 343 } 374 344 375 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 376 345 return 0; 377 346 } 378 347 ··· 383 358 .dump = xelpg_dump, 384 359 }; 385 360 361 + static void xe2_pat_entry_dump(struct drm_printer *p, const char *label, u32 pat, bool rsvd) 362 + { 363 + drm_printf(p, "%s= [ %u, %u, %u, %u, %u, %u ] (%#8x)%s\n", label, 364 + !!(pat & XE2_NO_PROMOTE), 365 + !!(pat & XE2_COMP_EN), 366 + REG_FIELD_GET(XE2_L3_CLOS, pat), 367 + REG_FIELD_GET(XE2_L3_POLICY, pat), 368 + REG_FIELD_GET(XE2_L4_POLICY, pat), 369 + REG_FIELD_GET(XE2_COH_MODE, pat), 370 + pat, rsvd ? " *" : ""); 371 + } 372 + 373 + static void xe3p_xpc_pat_entry_dump(struct drm_printer *p, const char *label, u32 pat, bool rsvd) 374 + { 375 + drm_printf(p, "%s= [ %u, %u, %u, %u, %u ] (%#8x)%s\n", label, 376 + !!(pat & XE2_NO_PROMOTE), 377 + REG_FIELD_GET(XE2_L3_CLOS, pat), 378 + REG_FIELD_GET(XE2_L3_POLICY, pat), 379 + REG_FIELD_GET(XE2_L4_POLICY, pat), 380 + REG_FIELD_GET(XE2_COH_MODE, pat), 381 + pat, rsvd ? " *" : ""); 382 + } 383 + 386 384 static int xe2_dump(struct xe_gt *gt, struct drm_printer *p) 387 385 { 388 386 struct xe_device *xe = gt_to_xe(gt); 389 - unsigned int fw_ref; 390 387 u32 pat; 391 388 int i; 389 + char label[PAT_LABEL_LEN]; 392 390 393 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 394 - if (!fw_ref) 391 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 392 + if (!fw_ref.domains) 395 393 return -ETIMEDOUT; 396 394 397 395 drm_printf(p, "PAT table: (* = reserved entry)\n"); ··· 425 377 else 426 378 pat = xe_gt_mcr_unicast_read_any(gt, XE_REG_MCR(_PAT_INDEX(i))); 427 379 428 - drm_printf(p, "PAT[%2d] = [ %u, %u, %u, %u, %u, %u ] (%#8x)%s\n", i, 429 - !!(pat & XE2_NO_PROMOTE), 430 - !!(pat & XE2_COMP_EN), 431 - REG_FIELD_GET(XE2_L3_CLOS, pat), 432 - REG_FIELD_GET(XE2_L3_POLICY, pat), 433 - REG_FIELD_GET(XE2_L4_POLICY, pat), 434 - REG_FIELD_GET(XE2_COH_MODE, pat), 435 - pat, xe->pat.table[i].valid ? "" : " *"); 380 + xe_pat_index_label(label, sizeof(label), i); 381 + xe2_pat_entry_dump(p, label, pat, !xe->pat.table[i].valid); 436 382 } 437 383 438 384 /* ··· 439 397 pat = xe_gt_mcr_unicast_read_any(gt, XE_REG_MCR(_PAT_PTA)); 440 398 441 399 drm_printf(p, "Page Table Access:\n"); 442 - drm_printf(p, "PTA_MODE= [ %u, %u, %u, %u, %u, %u ] (%#8x)\n", 443 - !!(pat & XE2_NO_PROMOTE), 444 - !!(pat & XE2_COMP_EN), 445 - REG_FIELD_GET(XE2_L3_CLOS, pat), 446 - REG_FIELD_GET(XE2_L3_POLICY, pat), 447 - REG_FIELD_GET(XE2_L4_POLICY, pat), 448 - REG_FIELD_GET(XE2_COH_MODE, pat), 449 - pat); 400 + xe2_pat_entry_dump(p, "PTA_MODE", pat, false); 450 401 451 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 452 402 return 0; 453 403 } 454 404 ··· 453 419 static int xe3p_xpc_dump(struct xe_gt *gt, struct drm_printer *p) 454 420 { 455 421 struct xe_device *xe = gt_to_xe(gt); 456 - unsigned int fw_ref; 457 422 u32 pat; 458 423 int i; 424 + char label[PAT_LABEL_LEN]; 459 425 460 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 461 - if (!fw_ref) 426 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 427 + if (!fw_ref.domains) 462 428 return -ETIMEDOUT; 463 429 464 430 drm_printf(p, "PAT table: (* = reserved entry)\n"); ··· 466 432 for (i = 0; i < xe->pat.n_entries; i++) { 467 433 pat = xe_gt_mcr_unicast_read_any(gt, XE_REG_MCR(_PAT_INDEX(i))); 468 434 469 - drm_printf(p, "PAT[%2d] = [ %u, %u, %u, %u, %u ] (%#8x)%s\n", i, 470 - !!(pat & XE2_NO_PROMOTE), 471 - REG_FIELD_GET(XE2_L3_CLOS, pat), 472 - REG_FIELD_GET(XE2_L3_POLICY, pat), 473 - REG_FIELD_GET(XE2_L4_POLICY, pat), 474 - REG_FIELD_GET(XE2_COH_MODE, pat), 475 - pat, xe->pat.table[i].valid ? "" : " *"); 435 + xe_pat_index_label(label, sizeof(label), i); 436 + xe3p_xpc_pat_entry_dump(p, label, pat, !xe->pat.table[i].valid); 476 437 } 477 438 478 439 /* ··· 477 448 pat = xe_gt_mcr_unicast_read_any(gt, XE_REG_MCR(_PAT_PTA)); 478 449 479 450 drm_printf(p, "Page Table Access:\n"); 480 - drm_printf(p, "PTA_MODE= [ %u, %u, %u, %u, %u ] (%#8x)\n", 481 - !!(pat & XE2_NO_PROMOTE), 482 - REG_FIELD_GET(XE2_L3_CLOS, pat), 483 - REG_FIELD_GET(XE2_L3_POLICY, pat), 484 - REG_FIELD_GET(XE2_L4_POLICY, pat), 485 - REG_FIELD_GET(XE2_COH_MODE, pat), 486 - pat); 451 + xe3p_xpc_pat_entry_dump(p, "PTA_MODE", pat, false); 487 452 488 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 489 453 return 0; 490 454 } 491 455 ··· 599 577 return -EOPNOTSUPP; 600 578 601 579 return xe->pat.ops->dump(gt, p); 580 + } 581 + 582 + /** 583 + * xe_pat_dump_sw_config() - Dump the software-configured GT PAT table into a drm printer. 584 + * @gt: the &xe_gt 585 + * @p: the &drm_printer 586 + * 587 + * Return: 0 on success or a negative error code on failure. 588 + */ 589 + int xe_pat_dump_sw_config(struct xe_gt *gt, struct drm_printer *p) 590 + { 591 + struct xe_device *xe = gt_to_xe(gt); 592 + char label[PAT_LABEL_LEN]; 593 + 594 + if (!xe->pat.table || !xe->pat.n_entries) 595 + return -EOPNOTSUPP; 596 + 597 + drm_printf(p, "PAT table:%s\n", GRAPHICS_VER(xe) >= 20 ? " (* = reserved entry)" : ""); 598 + for (u32 i = 0; i < xe->pat.n_entries; i++) { 599 + u32 pat = xe->pat.table[i].value; 600 + 601 + if (GRAPHICS_VERx100(xe) == 3511) { 602 + xe_pat_index_label(label, sizeof(label), i); 603 + xe3p_xpc_pat_entry_dump(p, label, pat, !xe->pat.table[i].valid); 604 + } else if (GRAPHICS_VER(xe) == 30 || GRAPHICS_VER(xe) == 20) { 605 + xe_pat_index_label(label, sizeof(label), i); 606 + xe2_pat_entry_dump(p, label, pat, !xe->pat.table[i].valid); 607 + } else if (xe->info.platform == XE_METEORLAKE) { 608 + xelpg_pat_entry_dump(p, i, pat); 609 + } else if (xe->info.platform == XE_PVC) { 610 + xehpc_pat_entry_dump(p, i, pat); 611 + } else if (xe->info.platform == XE_DG2 || GRAPHICS_VERx100(xe) <= 1210) { 612 + xelp_pat_entry_dump(p, i, pat); 613 + } else { 614 + return -EOPNOTSUPP; 615 + } 616 + } 617 + 618 + if (xe->pat.pat_pta) { 619 + u32 pat = xe->pat.pat_pta->value; 620 + 621 + drm_printf(p, "Page Table Access:\n"); 622 + xe2_pat_entry_dump(p, "PTA_MODE", pat, false); 623 + } 624 + 625 + if (xe->pat.pat_ats) { 626 + u32 pat = xe->pat.pat_ats->value; 627 + 628 + drm_printf(p, "PCIe ATS/PASID:\n"); 629 + xe2_pat_entry_dump(p, "PAT_ATS ", pat, false); 630 + } 631 + 632 + drm_printf(p, "Cache Level:\n"); 633 + drm_printf(p, "IDX[XE_CACHE_NONE] = %d\n", xe->pat.idx[XE_CACHE_NONE]); 634 + drm_printf(p, "IDX[XE_CACHE_WT] = %d\n", xe->pat.idx[XE_CACHE_WT]); 635 + drm_printf(p, "IDX[XE_CACHE_WB] = %d\n", xe->pat.idx[XE_CACHE_WB]); 636 + if (GRAPHICS_VER(xe) >= 20) { 637 + drm_printf(p, "IDX[XE_CACHE_NONE_COMPRESSION] = %d\n", 638 + xe->pat.idx[XE_CACHE_NONE_COMPRESSION]); 639 + } 640 + 641 + return 0; 602 642 }
+21
drivers/gpu/drm/xe/xe_pat.h
··· 49 49 void xe_pat_init(struct xe_gt *gt); 50 50 51 51 int xe_pat_dump(struct xe_gt *gt, struct drm_printer *p); 52 + int xe_pat_dump_sw_config(struct xe_gt *gt, struct drm_printer *p); 52 53 53 54 /** 54 55 * xe_pat_index_get_coh_mode - Extract the coherency mode for the given ··· 58 57 * @pat_index: The pat_index to query 59 58 */ 60 59 u16 xe_pat_index_get_coh_mode(struct xe_device *xe, u16 pat_index); 60 + 61 + /** 62 + * xe_pat_index_get_comp_en - Extract the compression enable flag for 63 + * the given pat_index. 64 + * @xe: xe device 65 + * @pat_index: The pat_index to query 66 + * 67 + * Return: true if compression is enabled for this pat_index, false otherwise. 68 + */ 69 + bool xe_pat_index_get_comp_en(struct xe_device *xe, u16 pat_index); 70 + 71 + #define XE_L3_POLICY_WB 0 /* Write-back */ 72 + #define XE_L3_POLICY_XD 1 /* WB - Transient Display */ 73 + #define XE_L3_POLICY_UC 3 /* Uncached */ 74 + /** 75 + * xe_pat_index_get_l3_policy - Extract the L3 policy for the given pat_index. 76 + * @xe: xe device 77 + * @pat_index: The pat_index to query 78 + */ 79 + u16 xe_pat_index_get_l3_policy(struct xe_device *xe, u16 pat_index); 61 80 62 81 #endif
+29
drivers/gpu/drm/xe/xe_pci.c
··· 108 108 109 109 static const struct xe_graphics_desc graphics_xe3p_xpc = { 110 110 XE2_GFX_FEATURES, 111 + .has_indirect_ring_state = 1, 111 112 .hw_engine_mask = 112 113 GENMASK(XE_HW_ENGINE_BCS8, XE_HW_ENGINE_BCS1) | 113 114 GENMASK(XE_HW_ENGINE_CCS3, XE_HW_ENGINE_CCS0), ··· 169 168 .pre_gmdid_media_ip = &media_ip_xem, 170 169 PLATFORM(TIGERLAKE), 171 170 .dma_mask_size = 39, 171 + .has_cached_pt = true, 172 172 .has_display = true, 173 173 .has_llc = true, 174 174 .has_sriov = true, ··· 184 182 .pre_gmdid_media_ip = &media_ip_xem, 185 183 PLATFORM(ROCKETLAKE), 186 184 .dma_mask_size = 39, 185 + .has_cached_pt = true, 187 186 .has_display = true, 188 187 .has_llc = true, 189 188 .max_gt_per_tile = 1, ··· 200 197 .pre_gmdid_media_ip = &media_ip_xem, 201 198 PLATFORM(ALDERLAKE_S), 202 199 .dma_mask_size = 39, 200 + .has_cached_pt = true, 203 201 .has_display = true, 204 202 .has_llc = true, 205 203 .has_sriov = true, ··· 221 217 .pre_gmdid_media_ip = &media_ip_xem, 222 218 PLATFORM(ALDERLAKE_P), 223 219 .dma_mask_size = 39, 220 + .has_cached_pt = true, 224 221 .has_display = true, 225 222 .has_llc = true, 226 223 .has_sriov = true, ··· 240 235 .pre_gmdid_media_ip = &media_ip_xem, 241 236 PLATFORM(ALDERLAKE_N), 242 237 .dma_mask_size = 39, 238 + .has_cached_pt = true, 243 239 .has_display = true, 244 240 .has_llc = true, 245 241 .has_sriov = true, ··· 367 361 .has_mbx_power_limits = true, 368 362 .has_gsc_nvm = 1, 369 363 .has_heci_cscfi = 1, 364 + .has_i2c = true, 370 365 .has_late_bind = true, 366 + .has_pre_prod_wa = 1, 371 367 .has_sriov = true, 372 368 .has_mem_copy_instr = true, 373 369 .max_gt_per_tile = 2, ··· 389 381 .has_flat_ccs = 1, 390 382 .has_sriov = true, 391 383 .has_mem_copy_instr = true, 384 + .has_pre_prod_wa = 1, 392 385 .max_gt_per_tile = 2, 393 386 .needs_scratch = true, 394 387 .needs_shared_vf_gt_wq = true, ··· 403 394 .has_display = true, 404 395 .has_flat_ccs = 1, 405 396 .has_mem_copy_instr = true, 397 + .has_pre_prod_wa = 1, 406 398 .max_gt_per_tile = 2, 407 399 .require_force_probe = true, 408 400 .va_bits = 48, ··· 416 406 .dma_mask_size = 52, 417 407 .has_display = false, 418 408 .has_flat_ccs = false, 409 + .has_gsc_nvm = 1, 410 + .has_i2c = true, 419 411 .has_mbx_power_limits = true, 412 + .has_mert = true, 413 + .has_pre_prod_wa = 1, 420 414 .has_sriov = true, 421 415 .max_gt_per_tile = 2, 422 416 .require_force_probe = true, ··· 677 663 xe->info.vram_flags = desc->vram_flags; 678 664 679 665 xe->info.is_dgfx = desc->is_dgfx; 666 + xe->info.has_cached_pt = desc->has_cached_pt; 680 667 xe->info.has_fan_control = desc->has_fan_control; 681 668 /* runtime fusing may force flat_ccs to disabled later */ 682 669 xe->info.has_flat_ccs = desc->has_flat_ccs; ··· 685 670 xe->info.has_gsc_nvm = desc->has_gsc_nvm; 686 671 xe->info.has_heci_gscfi = desc->has_heci_gscfi; 687 672 xe->info.has_heci_cscfi = desc->has_heci_cscfi; 673 + xe->info.has_i2c = desc->has_i2c; 688 674 xe->info.has_late_bind = desc->has_late_bind; 689 675 xe->info.has_llc = desc->has_llc; 676 + xe->info.has_mert = desc->has_mert; 677 + xe->info.has_page_reclaim_hw_assist = desc->has_page_reclaim_hw_assist; 678 + xe->info.has_pre_prod_wa = desc->has_pre_prod_wa; 690 679 xe->info.has_pxp = desc->has_pxp; 691 680 xe->info.has_sriov = xe_configfs_primary_gt_allowed(to_pci_dev(xe->drm.dev)) && 692 681 desc->has_sriov; ··· 774 755 gt->info.type = XE_GT_TYPE_MAIN; 775 756 gt->info.id = tile->id * xe->info.max_gt_per_tile; 776 757 gt->info.has_indirect_ring_state = graphics_desc->has_indirect_ring_state; 758 + gt->info.multi_queue_engine_class_mask = graphics_desc->multi_queue_engine_class_mask; 777 759 gt->info.engine_mask = graphics_desc->hw_engine_mask; 778 760 779 761 /* ··· 1172 1152 struct pci_dev *pdev = to_pci_dev(dev); 1173 1153 struct xe_device *xe = pdev_to_xe_device(pdev); 1174 1154 int err; 1155 + 1156 + /* 1157 + * We hold an additional reference to the runtime PM to keep PF in D0 1158 + * during VFs lifetime, as our VFs do not implement the PM capability. 1159 + * This means we should never be runtime suspending as long as VFs are 1160 + * enabled. 1161 + */ 1162 + xe_assert(xe, !IS_SRIOV_VF(xe)); 1163 + xe_assert(xe, !pci_num_vf(pdev)); 1175 1164 1176 1165 err = xe_pm_runtime_suspend(xe); 1177 1166 if (err)
+3 -7
drivers/gpu/drm/xe/xe_pci_sriov.c
··· 219 219 int xe_pci_sriov_configure(struct pci_dev *pdev, int num_vfs) 220 220 { 221 221 struct xe_device *xe = pdev_to_xe_device(pdev); 222 - int ret; 223 222 224 223 if (!IS_SRIOV_PF(xe)) 225 224 return -ENODEV; ··· 232 233 if (num_vfs && pci_num_vf(pdev)) 233 234 return -EBUSY; 234 235 235 - xe_pm_runtime_get(xe); 236 + guard(xe_pm_runtime)(xe); 236 237 if (num_vfs > 0) 237 - ret = pf_enable_vfs(xe, num_vfs); 238 + return pf_enable_vfs(xe, num_vfs); 238 239 else 239 - ret = pf_disable_vfs(xe); 240 - xe_pm_runtime_put(xe); 241 - 242 - return ret; 240 + return pf_disable_vfs(xe); 243 241 } 244 242 245 243 /**
+6
drivers/gpu/drm/xe/xe_pci_types.h
··· 37 37 u8 require_force_probe:1; 38 38 u8 is_dgfx:1; 39 39 40 + u8 has_cached_pt:1; 40 41 u8 has_display:1; 41 42 u8 has_fan_control:1; 42 43 u8 has_flat_ccs:1; 43 44 u8 has_gsc_nvm:1; 44 45 u8 has_heci_gscfi:1; 45 46 u8 has_heci_cscfi:1; 47 + u8 has_i2c:1; 46 48 u8 has_late_bind:1; 47 49 u8 has_llc:1; 48 50 u8 has_mbx_power_limits:1; 49 51 u8 has_mem_copy_instr:1; 52 + u8 has_mert:1; 53 + u8 has_pre_prod_wa:1; 54 + u8 has_page_reclaim_hw_assist:1; 50 55 u8 has_pxp:1; 51 56 u8 has_sriov:1; 52 57 u8 needs_scratch:1; ··· 63 58 64 59 struct xe_graphics_desc { 65 60 u64 hw_engine_mask; /* hardware engines provided by graphics IP */ 61 + u16 multi_queue_engine_class_mask; /* bitmask of engine classes which support multi queue */ 66 62 67 63 u8 has_asid:1; 68 64 u8 has_atomic_enable_pte_bit:1;
+2
drivers/gpu/drm/xe/xe_pcode_api.h
··· 77 77 78 78 #define PCODE_SCRATCH(x) XE_REG(0x138320 + ((x) * 4)) 79 79 /* PCODE_SCRATCH0 */ 80 + #define BREADCRUMB_VERSION REG_GENMASK(31, 29) 80 81 #define AUXINFO_REG_OFFSET REG_GENMASK(17, 15) 81 82 #define OVERFLOW_REG_OFFSET REG_GENMASK(14, 12) 82 83 #define HISTORY_TRACKING REG_BIT(11) 83 84 #define OVERFLOW_SUPPORT REG_BIT(10) 84 85 #define AUXINFO_SUPPORT REG_BIT(9) 86 + #define FDO_MODE REG_BIT(4) 85 87 #define BOOT_STATUS REG_GENMASK(3, 1) 86 88 #define CRITICAL_FAILURE 4 87 89 #define NON_CRITICAL_FAILURE 7
+5 -5
drivers/gpu/drm/xe/xe_pm.c
··· 591 591 } 592 592 593 593 for_each_gt(gt, xe, id) { 594 - err = xe_gt_suspend(gt); 594 + err = xe->d3cold.allowed ? xe_gt_suspend(gt) : xe_gt_runtime_suspend(gt); 595 595 if (err) 596 596 goto out_resume; 597 597 } ··· 633 633 634 634 xe_rpm_lockmap_acquire(xe); 635 635 636 - for_each_gt(gt, xe, id) 637 - xe_gt_idle_disable_c6(gt); 638 - 639 636 if (xe->d3cold.allowed) { 637 + for_each_gt(gt, xe, id) 638 + xe_gt_idle_disable_c6(gt); 639 + 640 640 err = xe_pcode_ready(xe, true); 641 641 if (err) 642 642 goto out; ··· 657 657 xe_irq_resume(xe); 658 658 659 659 for_each_gt(gt, xe, id) 660 - xe_gt_resume(gt); 660 + xe->d3cold.allowed ? xe_gt_resume(gt) : xe_gt_runtime_resume(gt); 661 661 662 662 xe_display_pm_runtime_resume(xe); 663 663
+1 -1
drivers/gpu/drm/xe/xe_pmu.c
··· 425 425 struct perf_pmu_events_attr *pmu_attr = 426 426 container_of(attr, struct perf_pmu_events_attr, attr); 427 427 428 - return sprintf(buf, "event=%#04llx\n", pmu_attr->id); 428 + return sysfs_emit(buf, "event=%#04llx\n", pmu_attr->id); 429 429 } 430 430 431 431 #define XE_EVENT_ATTR(name_, v_, id_) \
+134 -1
drivers/gpu/drm/xe/xe_pt.c
··· 12 12 #include "xe_exec_queue.h" 13 13 #include "xe_gt.h" 14 14 #include "xe_migrate.h" 15 + #include "xe_page_reclaim.h" 15 16 #include "xe_pt_types.h" 16 17 #include "xe_pt_walk.h" 17 18 #include "xe_res_cursor.h" ··· 1536 1535 /** @modified_end: Walk range start, modified like @modified_start. */ 1537 1536 u64 modified_end; 1538 1537 1538 + /** @prl: Backing pointer to page reclaim list in pt_update_ops */ 1539 + struct xe_page_reclaim_list *prl; 1540 + 1539 1541 /* Output */ 1540 1542 /* @wupd: Structure to track the page-table updates we're building */ 1541 1543 struct xe_walk_update wupd; ··· 1576 1572 return false; 1577 1573 } 1578 1574 1575 + /* Huge 2MB leaf lives directly in a level-1 table and has no children */ 1576 + static bool is_2m_pte(struct xe_pt *pte) 1577 + { 1578 + return pte->level == 1 && !pte->base.children; 1579 + } 1580 + 1581 + /* page_size = 2^(reclamation_size + XE_PTE_SHIFT) */ 1582 + #define COMPUTE_RECLAIM_ADDRESS_MASK(page_size) \ 1583 + ({ \ 1584 + BUILD_BUG_ON(!__builtin_constant_p(page_size)); \ 1585 + ilog2(page_size) - XE_PTE_SHIFT; \ 1586 + }) 1587 + 1588 + static int generate_reclaim_entry(struct xe_tile *tile, 1589 + struct xe_page_reclaim_list *prl, 1590 + u64 pte, struct xe_pt *xe_child) 1591 + { 1592 + struct xe_guc_page_reclaim_entry *reclaim_entries = prl->entries; 1593 + u64 phys_page = (pte & XE_PTE_ADDR_MASK) >> XE_PTE_SHIFT; 1594 + int num_entries = prl->num_entries; 1595 + u32 reclamation_size; 1596 + 1597 + xe_tile_assert(tile, xe_child->level <= MAX_HUGEPTE_LEVEL); 1598 + xe_tile_assert(tile, reclaim_entries); 1599 + xe_tile_assert(tile, num_entries < XE_PAGE_RECLAIM_MAX_ENTRIES - 1); 1600 + 1601 + if (!xe_page_reclaim_list_valid(prl)) 1602 + return -EINVAL; 1603 + 1604 + /** 1605 + * reclamation_size indicates the size of the page to be 1606 + * invalidated and flushed from non-coherent cache. 1607 + * Page size is computed as 2^(reclamation_size + XE_PTE_SHIFT) bytes. 1608 + * Only 4K, 64K (level 0), and 2M pages are supported by hardware for page reclaim 1609 + */ 1610 + if (xe_child->level == 0 && !(pte & XE_PTE_PS64)) { 1611 + reclamation_size = COMPUTE_RECLAIM_ADDRESS_MASK(SZ_4K); /* reclamation_size = 0 */ 1612 + } else if (xe_child->level == 0) { 1613 + reclamation_size = COMPUTE_RECLAIM_ADDRESS_MASK(SZ_64K); /* reclamation_size = 4 */ 1614 + } else if (is_2m_pte(xe_child)) { 1615 + reclamation_size = COMPUTE_RECLAIM_ADDRESS_MASK(SZ_2M); /* reclamation_size = 9 */ 1616 + } else { 1617 + xe_page_reclaim_list_invalidate(prl); 1618 + vm_dbg(&tile_to_xe(tile)->drm, 1619 + "PRL invalidate: unsupported PTE level=%u pte=%#llx\n", 1620 + xe_child->level, pte); 1621 + return -EINVAL; 1622 + } 1623 + 1624 + reclaim_entries[num_entries].qw = 1625 + FIELD_PREP(XE_PAGE_RECLAIM_VALID, 1) | 1626 + FIELD_PREP(XE_PAGE_RECLAIM_SIZE, reclamation_size) | 1627 + FIELD_PREP(XE_PAGE_RECLAIM_ADDR_LO, phys_page) | 1628 + FIELD_PREP(XE_PAGE_RECLAIM_ADDR_HI, phys_page >> 20); 1629 + prl->num_entries++; 1630 + vm_dbg(&tile_to_xe(tile)->drm, 1631 + "PRL add entry: level=%u pte=%#llx reclamation_size=%u prl_idx=%d\n", 1632 + xe_child->level, pte, reclamation_size, num_entries); 1633 + 1634 + return 0; 1635 + } 1636 + 1579 1637 static int xe_pt_stage_unbind_entry(struct xe_ptw *parent, pgoff_t offset, 1580 1638 unsigned int level, u64 addr, u64 next, 1581 1639 struct xe_ptw **child, ··· 1645 1579 struct xe_pt_walk *walk) 1646 1580 { 1647 1581 struct xe_pt *xe_child = container_of(*child, typeof(*xe_child), base); 1582 + struct xe_pt_stage_unbind_walk *xe_walk = 1583 + container_of(walk, typeof(*xe_walk), base); 1584 + struct xe_device *xe = tile_to_xe(xe_walk->tile); 1648 1585 1649 1586 XE_WARN_ON(!*child); 1650 1587 XE_WARN_ON(!level); 1588 + /* Check for leaf node */ 1589 + if (xe_walk->prl && xe_page_reclaim_list_valid(xe_walk->prl) && 1590 + !xe_child->base.children) { 1591 + struct iosys_map *leaf_map = &xe_child->bo->vmap; 1592 + pgoff_t first = xe_pt_offset(addr, 0, walk); 1593 + pgoff_t count = xe_pt_num_entries(addr, next, 0, walk); 1651 1594 1652 - xe_pt_check_kill(addr, next, level - 1, xe_child, action, walk); 1595 + for (pgoff_t i = 0; i < count; i++) { 1596 + u64 pte = xe_map_rd(xe, leaf_map, (first + i) * sizeof(u64), u64); 1597 + int ret; 1598 + 1599 + /* Account for NULL terminated entry on end (-1) */ 1600 + if (xe_walk->prl->num_entries < XE_PAGE_RECLAIM_MAX_ENTRIES - 1) { 1601 + ret = generate_reclaim_entry(xe_walk->tile, xe_walk->prl, 1602 + pte, xe_child); 1603 + if (ret) 1604 + break; 1605 + } else { 1606 + /* overflow, mark as invalid */ 1607 + xe_page_reclaim_list_invalidate(xe_walk->prl); 1608 + vm_dbg(&xe->drm, 1609 + "PRL invalidate: overflow while adding pte=%#llx", 1610 + pte); 1611 + break; 1612 + } 1613 + } 1614 + } 1615 + 1616 + /* If aborting page walk early, invalidate PRL since PTE may be dropped from this abort */ 1617 + if (xe_pt_check_kill(addr, next, level - 1, xe_child, action, walk) && 1618 + xe_walk->prl && level > 1 && xe_child->base.children && xe_child->num_live != 0) { 1619 + xe_page_reclaim_list_invalidate(xe_walk->prl); 1620 + vm_dbg(&xe->drm, 1621 + "PRL invalidate: kill at level=%u addr=%#llx next=%#llx num_live=%u\n", 1622 + level, addr, next, xe_child->num_live); 1623 + } 1653 1624 1654 1625 return 0; 1655 1626 } ··· 1757 1654 { 1758 1655 u64 start = range ? xe_svm_range_start(range) : xe_vma_start(vma); 1759 1656 u64 end = range ? xe_svm_range_end(range) : xe_vma_end(vma); 1657 + struct xe_vm_pgtable_update_op *pt_update_op = 1658 + container_of(entries, struct xe_vm_pgtable_update_op, entries[0]); 1760 1659 struct xe_pt_stage_unbind_walk xe_walk = { 1761 1660 .base = { 1762 1661 .ops = &xe_pt_stage_unbind_ops, ··· 1770 1665 .modified_start = start, 1771 1666 .modified_end = end, 1772 1667 .wupd.entries = entries, 1668 + .prl = pt_update_op->prl, 1773 1669 }; 1774 1670 struct xe_pt *pt = vm->pt_root[tile->id]; 1775 1671 ··· 2003 1897 struct xe_vm_pgtable_update_ops *pt_update_ops, 2004 1898 struct xe_vma *vma) 2005 1899 { 1900 + struct xe_device *xe = tile_to_xe(tile); 2006 1901 u32 current_op = pt_update_ops->current_op; 2007 1902 struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op]; 2008 1903 int err; ··· 2021 1914 pt_op->vma = vma; 2022 1915 pt_op->bind = false; 2023 1916 pt_op->rebind = false; 1917 + /* 1918 + * Maintain one PRL located in pt_update_ops that all others in unbind op reference. 1919 + * Ensure that PRL is allocated only once, and if invalidated, remains an invalidated PRL. 1920 + */ 1921 + if (xe->info.has_page_reclaim_hw_assist && 1922 + xe_page_reclaim_list_is_new(&pt_update_ops->prl)) 1923 + xe_page_reclaim_list_alloc_entries(&pt_update_ops->prl); 1924 + 1925 + /* Page reclaim may not be needed due to other features, so skip the corresponding VMA */ 1926 + pt_op->prl = (xe_page_reclaim_list_valid(&pt_update_ops->prl) && 1927 + !xe_page_reclaim_skip(tile, vma)) ? &pt_update_ops->prl : NULL; 2024 1928 2025 1929 err = vma_reserve_fences(tile_to_xe(tile), vma); 2026 1930 if (err) ··· 2097 1979 pt_op->vma = XE_INVALID_VMA; 2098 1980 pt_op->bind = false; 2099 1981 pt_op->rebind = false; 1982 + pt_op->prl = NULL; 2100 1983 2101 1984 pt_op->num_entries = xe_pt_stage_unbind(tile, vm, NULL, range, 2102 1985 pt_op->entries); ··· 2215 2096 init_llist_head(&pt_update_ops->deferred); 2216 2097 pt_update_ops->start = ~0x0ull; 2217 2098 pt_update_ops->last = 0x0ull; 2099 + xe_page_reclaim_list_init(&pt_update_ops->prl); 2218 2100 } 2219 2101 2220 2102 /** ··· 2513 2393 goto kill_vm_tile1; 2514 2394 } 2515 2395 update.ijob = ijob; 2396 + /* 2397 + * Only add page reclaim for the primary GT. Media GT does not have 2398 + * any PPC to flush, so enabling the PPC flush bit for media is 2399 + * effectively a NOP and provides no performance benefit nor 2400 + * interfere with primary GT. 2401 + */ 2402 + if (xe_page_reclaim_list_valid(&pt_update_ops->prl)) { 2403 + xe_tlb_inval_job_add_page_reclaim(ijob, &pt_update_ops->prl); 2404 + /* Release ref from alloc, job will now handle it */ 2405 + xe_page_reclaim_list_invalidate(&pt_update_ops->prl); 2406 + } 2516 2407 2517 2408 if (tile->media_gt) { 2518 2409 dep_scheduler = to_dep_scheduler(q, tile->media_gt); ··· 2648 2517 struct xe_vm_pgtable_update_ops *pt_update_ops = 2649 2518 &vops->pt_update_ops[tile->id]; 2650 2519 int i; 2520 + 2521 + xe_page_reclaim_entries_put(pt_update_ops->prl.entries); 2651 2522 2652 2523 lockdep_assert_held(&vops->vm->lock); 2653 2524 xe_vm_assert_held(vops->vm);
+5
drivers/gpu/drm/xe/xe_pt_types.h
··· 8 8 9 9 #include <linux/types.h> 10 10 11 + #include "xe_page_reclaim.h" 11 12 #include "xe_pt_walk.h" 12 13 13 14 struct xe_bo; ··· 80 79 struct xe_vm_pgtable_update entries[XE_VM_MAX_LEVEL * 2 + 1]; 81 80 /** @vma: VMA for operation, operation not valid if NULL */ 82 81 struct xe_vma *vma; 82 + /** @prl: Backing pointer to page reclaim list of pt_update_ops */ 83 + struct xe_page_reclaim_list *prl; 83 84 /** @num_entries: number of entries for this update operation */ 84 85 u32 num_entries; 85 86 /** @bind: is a bind */ ··· 98 95 struct llist_head deferred; 99 96 /** @q: exec queue for PT operations */ 100 97 struct xe_exec_queue *q; 98 + /** @prl: embedded page reclaim list */ 99 + struct xe_page_reclaim_list prl; 101 100 /** @start: start address of ops */ 102 101 u64 start; 103 102 /** @last: last address of ops */
+18 -37
drivers/gpu/drm/xe/xe_pxp.c
··· 58 58 static bool pxp_prerequisites_done(const struct xe_pxp *pxp) 59 59 { 60 60 struct xe_gt *gt = pxp->gt; 61 - unsigned int fw_ref; 62 61 bool ready; 63 62 64 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); 63 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL); 65 64 66 65 /* 67 66 * If force_wake fails we could falsely report the prerequisites as not ··· 70 71 * PXP. Therefore, we can just log the force_wake error and not escalate 71 72 * it. 72 73 */ 73 - XE_WARN_ON(!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)); 74 + XE_WARN_ON(!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)); 74 75 75 76 /* PXP requires both HuC authentication via GSC and GSC proxy initialized */ 76 77 ready = xe_huc_is_authenticated(&gt->uc.huc, XE_HUC_AUTH_VIA_GSC) && 77 78 xe_gsc_proxy_init_done(&gt->uc.gsc); 78 - 79 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 80 79 81 80 return ready; 82 81 } ··· 101 104 xe_uc_fw_status_to_error(pxp->gt->uc.gsc.fw.status)) 102 105 return -EIO; 103 106 104 - xe_pm_runtime_get(pxp->xe); 107 + guard(xe_pm_runtime)(pxp->xe); 105 108 106 109 /* PXP requires both HuC loaded and GSC proxy initialized */ 107 110 if (pxp_prerequisites_done(pxp)) 108 111 ret = 1; 109 112 110 - xe_pm_runtime_put(pxp->xe); 111 113 return ret; 112 114 } 113 115 ··· 131 135 static int pxp_terminate_hw(struct xe_pxp *pxp) 132 136 { 133 137 struct xe_gt *gt = pxp->gt; 134 - unsigned int fw_ref; 135 138 int ret = 0; 136 139 137 140 drm_dbg(&pxp->xe->drm, "Terminating PXP\n"); 138 141 139 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 140 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FW_GT)) { 141 - ret = -EIO; 142 - goto out; 143 - } 142 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 143 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FW_GT)) 144 + return -EIO; 144 145 145 146 /* terminate the hw session */ 146 147 ret = xe_pxp_submit_session_termination(pxp, ARB_SESSION); 147 148 if (ret) 148 - goto out; 149 + return ret; 149 150 150 151 ret = pxp_wait_for_session_state(pxp, ARB_SESSION, false); 151 152 if (ret) 152 - goto out; 153 + return ret; 153 154 154 155 /* Trigger full HW cleanup */ 155 156 xe_mmio_write32(&gt->mmio, KCR_GLOBAL_TERMINATE, 1); 156 157 157 158 /* now we can tell the GSC to clean up its own state */ 158 - ret = xe_pxp_submit_session_invalidation(&pxp->gsc_res, ARB_SESSION); 159 - 160 - out: 161 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 162 - return ret; 159 + return xe_pxp_submit_session_invalidation(&pxp->gsc_res, ARB_SESSION); 163 160 } 164 161 165 162 static void mark_termination_in_progress(struct xe_pxp *pxp) ··· 315 326 { 316 327 u32 val = enable ? _MASKED_BIT_ENABLE(KCR_INIT_ALLOW_DISPLAY_ME_WRITES) : 317 328 _MASKED_BIT_DISABLE(KCR_INIT_ALLOW_DISPLAY_ME_WRITES); 318 - unsigned int fw_ref; 319 329 320 - fw_ref = xe_force_wake_get(gt_to_fw(pxp->gt), XE_FW_GT); 321 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FW_GT)) 330 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(pxp->gt), XE_FW_GT); 331 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FW_GT)) 322 332 return -EIO; 323 333 324 334 xe_mmio_write32(&pxp->gt->mmio, KCR_INIT, val); 325 - xe_force_wake_put(gt_to_fw(pxp->gt), fw_ref); 326 335 327 336 return 0; 328 337 } ··· 440 453 static int __pxp_start_arb_session(struct xe_pxp *pxp) 441 454 { 442 455 int ret; 443 - unsigned int fw_ref; 444 456 445 - fw_ref = xe_force_wake_get(gt_to_fw(pxp->gt), XE_FW_GT); 446 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FW_GT)) 457 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(pxp->gt), XE_FW_GT); 458 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FW_GT)) 447 459 return -EIO; 448 460 449 - if (pxp_session_is_in_play(pxp, ARB_SESSION)) { 450 - ret = -EEXIST; 451 - goto out_force_wake; 452 - } 461 + if (pxp_session_is_in_play(pxp, ARB_SESSION)) 462 + return -EEXIST; 453 463 454 464 ret = xe_pxp_submit_session_init(&pxp->gsc_res, ARB_SESSION); 455 465 if (ret) { 456 466 drm_err(&pxp->xe->drm, "Failed to init PXP arb session: %pe\n", ERR_PTR(ret)); 457 - goto out_force_wake; 467 + return ret; 458 468 } 459 469 460 470 ret = pxp_wait_for_session_state(pxp, ARB_SESSION, true); 461 471 if (ret) { 462 472 drm_err(&pxp->xe->drm, "PXP ARB session failed to go in play%pe\n", ERR_PTR(ret)); 463 - goto out_force_wake; 473 + return ret; 464 474 } 465 475 466 476 drm_dbg(&pxp->xe->drm, "PXP ARB session is active\n"); 467 - 468 - out_force_wake: 469 - xe_force_wake_put(gt_to_fw(pxp->gt), fw_ref); 470 - return ret; 477 + return 0; 471 478 } 472 479 473 480 /**
+12 -11
drivers/gpu/drm/xe/xe_query.c
··· 122 122 __ktime_func_t cpu_clock; 123 123 struct xe_hw_engine *hwe; 124 124 struct xe_gt *gt; 125 - unsigned int fw_ref; 126 125 127 126 if (IS_SRIOV_VF(xe)) 128 127 return -EOPNOTSUPP; ··· 157 158 if (!hwe) 158 159 return -EINVAL; 159 160 160 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); 161 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) { 162 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 163 - return -EIO; 161 + xe_with_force_wake(fw_ref, gt_to_fw(gt), XE_FORCEWAKE_ALL) { 162 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)) 163 + return -EIO; 164 + 165 + hwe_read_timestamp(hwe, &resp.engine_cycles, &resp.cpu_timestamp, 166 + &resp.cpu_delta, cpu_clock); 164 167 } 165 - 166 - hwe_read_timestamp(hwe, &resp.engine_cycles, &resp.cpu_timestamp, 167 - &resp.cpu_delta, cpu_clock); 168 - 169 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 170 168 171 169 if (GRAPHICS_VER(xe) >= 20) 172 170 resp.width = 64; ··· 338 342 if (xe->info.has_usm && IS_ENABLED(CONFIG_DRM_XE_GPUSVM)) 339 343 config->info[DRM_XE_QUERY_CONFIG_FLAGS] |= 340 344 DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR; 345 + if (GRAPHICS_VER(xe) >= 20) 346 + config->info[DRM_XE_QUERY_CONFIG_FLAGS] |= 347 + DRM_XE_QUERY_CONFIG_FLAG_HAS_NO_COMPRESSION_HINT; 341 348 config->info[DRM_XE_QUERY_CONFIG_FLAGS] |= 342 349 DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY; 343 350 config->info[DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT] = ··· 685 686 du->capabilities = DRM_XE_OA_CAPS_BASE | DRM_XE_OA_CAPS_SYNCS | 686 687 DRM_XE_OA_CAPS_OA_BUFFER_SIZE | 687 688 DRM_XE_OA_CAPS_WAIT_NUM_REPORTS | 688 - DRM_XE_OA_CAPS_OAM; 689 + DRM_XE_OA_CAPS_OAM | 690 + DRM_XE_OA_CAPS_OA_UNIT_GT_ID; 691 + du->gt_id = u->gt->info.id; 689 692 j = 0; 690 693 for_each_hw_engine(hwe, gt, hwe_id) { 691 694 if (!xe_hw_engine_is_reserved(hwe) &&
+5 -12
drivers/gpu/drm/xe/xe_reg_sr.c
··· 168 168 { 169 169 struct xe_reg_sr_entry *entry; 170 170 unsigned long reg; 171 - unsigned int fw_ref; 172 171 173 172 if (xa_empty(&sr->xa)) 174 173 return; ··· 177 178 178 179 xe_gt_dbg(gt, "Applying %s save-restore MMIOs\n", sr->name); 179 180 180 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); 181 - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) 182 - goto err_force_wake; 181 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FORCEWAKE_ALL); 182 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FORCEWAKE_ALL)) { 183 + xe_gt_err(gt, "Failed to apply, err=-ETIMEDOUT\n"); 184 + return; 185 + } 183 186 184 187 xa_for_each(&sr->xa, reg, entry) 185 188 apply_one_mmio(gt, entry); 186 - 187 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 188 - 189 - return; 190 - 191 - err_force_wake: 192 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 193 - xe_gt_err(gt, "Failed to apply, err=-ETIMEDOUT\n"); 194 189 } 195 190 196 191 /**
+59 -22
drivers/gpu/drm/xe/xe_reg_whitelist.c
··· 9 9 #include "regs/xe_gt_regs.h" 10 10 #include "regs/xe_oa_regs.h" 11 11 #include "regs/xe_regs.h" 12 + #include "xe_device.h" 12 13 #include "xe_gt_types.h" 13 14 #include "xe_gt_printk.h" 14 15 #include "xe_platform_types.h" ··· 25 24 const struct xe_hw_engine *hwe) 26 25 { 27 26 return hwe->class != XE_ENGINE_CLASS_RENDER; 27 + } 28 + 29 + static bool match_has_mert(const struct xe_device *xe, 30 + const struct xe_gt *gt, 31 + const struct xe_hw_engine *hwe) 32 + { 33 + return xe_device_has_mert((struct xe_device *)xe); 28 34 } 29 35 30 36 static const struct xe_rtp_entry_sr register_whitelist[] = { ··· 75 67 ENGINE_CLASS(RENDER)), 76 68 XE_RTP_ACTIONS(WHITELIST(CSBE_DEBUG_STATUS(RENDER_RING_BASE), 0)) 77 69 }, 78 - { XE_RTP_NAME("oa_reg_render"), 79 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, XE_RTP_END_VERSION_UNDEFINED), 80 - ENGINE_CLASS(RENDER)), 81 - XE_RTP_ACTIONS(WHITELIST(OAG_MMIOTRIGGER, 82 - RING_FORCE_TO_NONPRIV_ACCESS_RW), 83 - WHITELIST(OAG_OASTATUS, 84 - RING_FORCE_TO_NONPRIV_ACCESS_RD), 85 - WHITELIST(OAG_OAHEADPTR, 86 - RING_FORCE_TO_NONPRIV_ACCESS_RD | 87 - RING_FORCE_TO_NONPRIV_RANGE_4)) 88 - }, 89 - { XE_RTP_NAME("oa_reg_compute"), 90 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, XE_RTP_END_VERSION_UNDEFINED), 91 - ENGINE_CLASS(COMPUTE)), 92 - XE_RTP_ACTIONS(WHITELIST(OAG_MMIOTRIGGER, 93 - RING_FORCE_TO_NONPRIV_ACCESS_RW), 94 - WHITELIST(OAG_OASTATUS, 95 - RING_FORCE_TO_NONPRIV_ACCESS_RD), 96 - WHITELIST(OAG_OAHEADPTR, 97 - RING_FORCE_TO_NONPRIV_ACCESS_RD | 98 - RING_FORCE_TO_NONPRIV_RANGE_4)) 99 - }, 100 70 { XE_RTP_NAME("14024997852"), 101 71 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3005), ENGINE_CLASS(RENDER)), 102 72 XE_RTP_ACTIONS(WHITELIST(FF_MODE, 103 73 RING_FORCE_TO_NONPRIV_ACCESS_RW), 104 74 WHITELIST(VFLSKPD, 105 75 RING_FORCE_TO_NONPRIV_ACCESS_RW)) 76 + }, 77 + 78 + #define WHITELIST_OA_MMIO_TRG(trg, status, head) \ 79 + WHITELIST(trg, RING_FORCE_TO_NONPRIV_ACCESS_RW), \ 80 + WHITELIST(status, RING_FORCE_TO_NONPRIV_ACCESS_RD), \ 81 + WHITELIST(head, RING_FORCE_TO_NONPRIV_ACCESS_RD | RING_FORCE_TO_NONPRIV_RANGE_4) 82 + 83 + #define WHITELIST_OAG_MMIO_TRG \ 84 + WHITELIST_OA_MMIO_TRG(OAG_MMIOTRIGGER, OAG_OASTATUS, OAG_OAHEADPTR) 85 + 86 + #define WHITELIST_OAM_MMIO_TRG \ 87 + WHITELIST_OA_MMIO_TRG(OAM_MMIO_TRG(XE_OAM_SAG_BASE_ADJ), \ 88 + OAM_STATUS(XE_OAM_SAG_BASE_ADJ), \ 89 + OAM_HEAD_POINTER(XE_OAM_SAG_BASE_ADJ)), \ 90 + WHITELIST_OA_MMIO_TRG(OAM_MMIO_TRG(XE_OAM_SCMI_0_BASE_ADJ), \ 91 + OAM_STATUS(XE_OAM_SCMI_0_BASE_ADJ), \ 92 + OAM_HEAD_POINTER(XE_OAM_SCMI_0_BASE_ADJ)), \ 93 + WHITELIST_OA_MMIO_TRG(OAM_MMIO_TRG(XE_OAM_SCMI_1_BASE_ADJ), \ 94 + OAM_STATUS(XE_OAM_SCMI_1_BASE_ADJ), \ 95 + OAM_HEAD_POINTER(XE_OAM_SCMI_1_BASE_ADJ)) 96 + 97 + #define WHITELIST_OA_MERT_MMIO_TRG \ 98 + WHITELIST_OA_MMIO_TRG(OAMERT_MMIO_TRG, OAMERT_STATUS, OAMERT_HEAD_POINTER) 99 + 100 + { XE_RTP_NAME("oag_mmio_trg_rcs"), 101 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, XE_RTP_END_VERSION_UNDEFINED), 102 + ENGINE_CLASS(RENDER)), 103 + XE_RTP_ACTIONS(WHITELIST_OAG_MMIO_TRG) 104 + }, 105 + { XE_RTP_NAME("oag_mmio_trg_ccs"), 106 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, XE_RTP_END_VERSION_UNDEFINED), 107 + ENGINE_CLASS(COMPUTE)), 108 + XE_RTP_ACTIONS(WHITELIST_OAG_MMIO_TRG) 109 + }, 110 + { XE_RTP_NAME("oam_mmio_trg_vcs"), 111 + XE_RTP_RULES(MEDIA_VERSION_RANGE(1300, XE_RTP_END_VERSION_UNDEFINED), 112 + ENGINE_CLASS(VIDEO_DECODE)), 113 + XE_RTP_ACTIONS(WHITELIST_OAM_MMIO_TRG) 114 + }, 115 + { XE_RTP_NAME("oam_mmio_trg_vecs"), 116 + XE_RTP_RULES(MEDIA_VERSION_RANGE(1300, XE_RTP_END_VERSION_UNDEFINED), 117 + ENGINE_CLASS(VIDEO_ENHANCE)), 118 + XE_RTP_ACTIONS(WHITELIST_OAM_MMIO_TRG) 119 + }, 120 + { XE_RTP_NAME("oa_mert_mmio_trg_ccs"), 121 + XE_RTP_RULES(FUNC(match_has_mert), ENGINE_CLASS(COMPUTE)), 122 + XE_RTP_ACTIONS(WHITELIST_OA_MERT_MMIO_TRG) 123 + }, 124 + { XE_RTP_NAME("oa_mert_mmio_trg_bcs"), 125 + XE_RTP_RULES(FUNC(match_has_mert), ENGINE_CLASS(COPY)), 126 + XE_RTP_ACTIONS(WHITELIST_OA_MERT_MMIO_TRG) 106 127 }, 107 128 }; 108 129
+35 -35
drivers/gpu/drm/xe/xe_ring_ops.c
··· 12 12 #include "regs/xe_engine_regs.h" 13 13 #include "regs/xe_gt_regs.h" 14 14 #include "regs/xe_lrc_layout.h" 15 - #include "xe_exec_queue_types.h" 15 + #include "xe_exec_queue.h" 16 16 #include "xe_gt.h" 17 17 #include "xe_lrc.h" 18 18 #include "xe_macros.h" ··· 135 135 return i; 136 136 } 137 137 138 - static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw, 139 - int i) 138 + static int emit_pipe_invalidate(struct xe_exec_queue *q, u32 mask_flags, 139 + bool invalidate_tlb, u32 *dw, int i) 140 140 { 141 141 u32 flags0 = 0; 142 - u32 flags1 = PIPE_CONTROL_CS_STALL | 143 - PIPE_CONTROL_COMMAND_CACHE_INVALIDATE | 142 + u32 flags1 = PIPE_CONTROL_COMMAND_CACHE_INVALIDATE | 144 143 PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE | 145 144 PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | 146 145 PIPE_CONTROL_VF_CACHE_INVALIDATE | ··· 150 151 151 152 if (invalidate_tlb) 152 153 flags1 |= PIPE_CONTROL_TLB_INVALIDATE; 154 + 155 + if (xe_exec_queue_is_multi_queue(q)) 156 + flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE; 157 + else 158 + flags1 |= PIPE_CONTROL_CS_STALL; 153 159 154 160 flags1 &= ~mask_flags; 155 161 ··· 179 175 180 176 static int emit_render_cache_flush(struct xe_sched_job *job, u32 *dw, int i) 181 177 { 182 - struct xe_gt *gt = job->q->gt; 178 + struct xe_exec_queue *q = job->q; 179 + struct xe_gt *gt = q->gt; 183 180 bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK); 184 - u32 flags; 181 + u32 flags0, flags1; 185 182 186 183 if (XE_GT_WA(gt, 14016712196)) 187 184 i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH, 188 185 LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0); 189 186 190 - flags = (PIPE_CONTROL_CS_STALL | 191 - PIPE_CONTROL_TILE_CACHE_FLUSH | 187 + flags0 = PIPE_CONTROL0_HDC_PIPELINE_FLUSH; 188 + flags1 = (PIPE_CONTROL_TILE_CACHE_FLUSH | 192 189 PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | 193 190 PIPE_CONTROL_DEPTH_CACHE_FLUSH | 194 191 PIPE_CONTROL_DC_FLUSH_ENABLE | 195 192 PIPE_CONTROL_FLUSH_ENABLE); 196 193 197 194 if (XE_GT_WA(gt, 1409600907)) 198 - flags |= PIPE_CONTROL_DEPTH_STALL; 195 + flags1 |= PIPE_CONTROL_DEPTH_STALL; 199 196 200 197 if (lacks_render) 201 - flags &= ~PIPE_CONTROL_3D_ARCH_FLAGS; 198 + flags1 &= ~PIPE_CONTROL_3D_ARCH_FLAGS; 202 199 else if (job->q->class == XE_ENGINE_CLASS_COMPUTE) 203 - flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS; 200 + flags1 &= ~PIPE_CONTROL_3D_ENGINE_FLAGS; 204 201 205 - return emit_pipe_control(dw, i, PIPE_CONTROL0_HDC_PIPELINE_FLUSH, flags, 0, 0); 202 + if (xe_exec_queue_is_multi_queue(q)) 203 + flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE; 204 + else 205 + flags1 |= PIPE_CONTROL_CS_STALL; 206 + 207 + return emit_pipe_control(dw, i, flags0, flags1, 0, 0); 206 208 } 207 209 208 - static int emit_pipe_control_to_ring_end(struct xe_hw_engine *hwe, u32 *dw, int i) 210 + static int emit_pipe_imm_ggtt(struct xe_exec_queue *q, u32 addr, u32 value, 211 + bool stall_only, u32 *dw, int i) 209 212 { 210 - if (hwe->class != XE_ENGINE_CLASS_RENDER) 211 - return i; 212 - 213 - if (XE_GT_WA(hwe->gt, 16020292621)) 214 - i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_LRI_POST_SYNC, 215 - RING_NOPID(hwe->mmio_base).addr, 0); 216 - 217 - return i; 218 - } 219 - 220 - static int emit_pipe_imm_ggtt(u32 addr, u32 value, bool stall_only, u32 *dw, 221 - int i) 222 - { 223 - u32 flags = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_GLOBAL_GTT_IVB | 224 - PIPE_CONTROL_QW_WRITE; 213 + u32 flags0 = 0, flags1 = PIPE_CONTROL_GLOBAL_GTT_IVB | PIPE_CONTROL_QW_WRITE; 225 214 226 215 if (!stall_only) 227 - flags |= PIPE_CONTROL_FLUSH_ENABLE; 216 + flags1 |= PIPE_CONTROL_FLUSH_ENABLE; 228 217 229 - return emit_pipe_control(dw, i, 0, flags, addr, value); 218 + if (xe_exec_queue_is_multi_queue(q)) 219 + flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE; 220 + else 221 + flags1 |= PIPE_CONTROL_CS_STALL; 222 + 223 + return emit_pipe_control(dw, i, flags0, flags1, addr, value); 230 224 } 231 225 232 226 static u32 get_ppgtt_flag(struct xe_sched_job *job) ··· 373 371 mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS; 374 372 375 373 /* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */ 376 - i = emit_pipe_invalidate(mask_flags, job->ring_ops_flush_tlb, dw, i); 374 + i = emit_pipe_invalidate(job->q, mask_flags, job->ring_ops_flush_tlb, dw, i); 377 375 378 376 /* hsdes: 1809175790 */ 379 377 if (has_aux_ccs(xe)) ··· 393 391 job->user_fence.value, 394 392 dw, i); 395 393 396 - i = emit_pipe_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno, lacks_render, dw, i); 394 + i = emit_pipe_imm_ggtt(job->q, xe_lrc_seqno_ggtt_addr(lrc), seqno, lacks_render, dw, i); 397 395 398 396 i = emit_user_interrupt(dw, i); 399 - 400 - i = emit_pipe_control_to_ring_end(job->q->hwe, dw, i); 401 397 402 398 xe_gt_assert(gt, i <= MAX_JOB_SIZE_DW); 403 399
+66 -1
drivers/gpu/drm/xe/xe_sa.c
··· 29 29 kvfree(sa_manager->cpu_ptr); 30 30 31 31 sa_manager->bo = NULL; 32 + sa_manager->shadow = NULL; 32 33 } 33 34 34 35 /** ··· 38 37 * @size: number of bytes to allocate 39 38 * @guard: number of bytes to exclude from suballocations 40 39 * @align: alignment for each suballocated chunk 40 + * @flags: flags for suballocator 41 41 * 42 42 * Prepares the suballocation manager for suballocations. 43 43 * 44 44 * Return: a pointer to the &xe_sa_manager or an ERR_PTR on failure. 45 45 */ 46 - struct xe_sa_manager *__xe_sa_bo_manager_init(struct xe_tile *tile, u32 size, u32 guard, u32 align) 46 + struct xe_sa_manager *__xe_sa_bo_manager_init(struct xe_tile *tile, u32 size, 47 + u32 guard, u32 align, u32 flags) 47 48 { 48 49 struct xe_device *xe = tile_to_xe(tile); 49 50 struct xe_sa_manager *sa_manager; ··· 82 79 memset(sa_manager->cpu_ptr, 0, bo->ttm.base.size); 83 80 } 84 81 82 + if (flags & XE_SA_BO_MANAGER_FLAG_SHADOW) { 83 + struct xe_bo *shadow; 84 + 85 + ret = drmm_mutex_init(&xe->drm, &sa_manager->swap_guard); 86 + if (ret) 87 + return ERR_PTR(ret); 88 + 89 + shadow = xe_managed_bo_create_pin_map(xe, tile, size, 90 + XE_BO_FLAG_VRAM_IF_DGFX(tile) | 91 + XE_BO_FLAG_GGTT | 92 + XE_BO_FLAG_GGTT_INVALIDATE | 93 + XE_BO_FLAG_PINNED_NORESTORE); 94 + if (IS_ERR(shadow)) { 95 + drm_err(&xe->drm, "Failed to prepare %uKiB BO for SA manager (%pe)\n", 96 + size / SZ_1K, shadow); 97 + return ERR_CAST(shadow); 98 + } 99 + sa_manager->shadow = shadow; 100 + } 101 + 85 102 drm_suballoc_manager_init(&sa_manager->base, managed_size, align); 86 103 ret = drmm_add_action_or_reset(&xe->drm, xe_sa_bo_manager_fini, 87 104 sa_manager); ··· 109 86 return ERR_PTR(ret); 110 87 111 88 return sa_manager; 89 + } 90 + 91 + /** 92 + * xe_sa_bo_swap_shadow() - Swap the SA BO with shadow BO. 93 + * @sa_manager: the XE sub allocator manager 94 + * 95 + * Swaps the sub-allocator primary buffer object with shadow buffer object. 96 + * 97 + * Return: None. 98 + */ 99 + void xe_sa_bo_swap_shadow(struct xe_sa_manager *sa_manager) 100 + { 101 + struct xe_device *xe = tile_to_xe(sa_manager->bo->tile); 102 + 103 + xe_assert(xe, sa_manager->shadow); 104 + lockdep_assert_held(&sa_manager->swap_guard); 105 + 106 + swap(sa_manager->bo, sa_manager->shadow); 107 + if (!sa_manager->bo->vmap.is_iomem) 108 + sa_manager->cpu_ptr = sa_manager->bo->vmap.vaddr; 109 + } 110 + 111 + /** 112 + * xe_sa_bo_sync_shadow() - Sync the SA Shadow BO with primary BO. 113 + * @sa_bo: the sub-allocator buffer object. 114 + * 115 + * Synchronize sub-allocator shadow buffer object with primary buffer object. 116 + * 117 + * Return: None. 118 + */ 119 + void xe_sa_bo_sync_shadow(struct drm_suballoc *sa_bo) 120 + { 121 + struct xe_sa_manager *sa_manager = to_xe_sa_manager(sa_bo->manager); 122 + struct xe_device *xe = tile_to_xe(sa_manager->bo->tile); 123 + 124 + xe_assert(xe, sa_manager->shadow); 125 + lockdep_assert_held(&sa_manager->swap_guard); 126 + 127 + xe_map_memcpy_to(xe, &sa_manager->shadow->vmap, 128 + drm_suballoc_soffset(sa_bo), 129 + xe_sa_bo_cpu_addr(sa_bo), 130 + drm_suballoc_size(sa_bo)); 112 131 } 113 132 114 133 /**
+18 -2
drivers/gpu/drm/xe/xe_sa.h
··· 14 14 struct dma_fence; 15 15 struct xe_tile; 16 16 17 - struct xe_sa_manager *__xe_sa_bo_manager_init(struct xe_tile *tile, u32 size, u32 guard, u32 align); 17 + #define XE_SA_BO_MANAGER_FLAG_SHADOW BIT(0) 18 + struct xe_sa_manager *__xe_sa_bo_manager_init(struct xe_tile *tile, u32 size, 19 + u32 guard, u32 align, u32 flags); 18 20 struct drm_suballoc *__xe_sa_bo_new(struct xe_sa_manager *sa_manager, u32 size, gfp_t gfp); 19 21 20 22 static inline struct xe_sa_manager *xe_sa_bo_manager_init(struct xe_tile *tile, u32 size, u32 align) 21 23 { 22 - return __xe_sa_bo_manager_init(tile, size, SZ_4K, align); 24 + return __xe_sa_bo_manager_init(tile, size, SZ_4K, align, 0); 23 25 } 24 26 25 27 /** ··· 69 67 { 70 68 return to_xe_sa_manager(sa->manager)->cpu_ptr + 71 69 drm_suballoc_soffset(sa); 70 + } 71 + 72 + void xe_sa_bo_swap_shadow(struct xe_sa_manager *sa_manager); 73 + void xe_sa_bo_sync_shadow(struct drm_suballoc *sa_bo); 74 + 75 + /** 76 + * xe_sa_bo_swap_guard() - Retrieve the SA BO swap guard within sub-allocator. 77 + * @sa_manager: the &xe_sa_manager 78 + * 79 + * Return: Sub alloctor swap guard mutex. 80 + */ 81 + static inline struct mutex *xe_sa_bo_swap_guard(struct xe_sa_manager *sa_manager) 82 + { 83 + return &sa_manager->swap_guard; 72 84 } 73 85 74 86 #endif
+3
drivers/gpu/drm/xe/xe_sa_types.h
··· 12 12 struct xe_sa_manager { 13 13 struct drm_suballoc_manager base; 14 14 struct xe_bo *bo; 15 + struct xe_bo *shadow; 16 + /** @swap_guard: Timeline guard updating @bo and @shadow */ 17 + struct mutex swap_guard; 15 18 void *cpu_ptr; 16 19 bool is_iomem; 17 20 };
+1 -1
drivers/gpu/drm/xe/xe_sriov_packet.c
··· 358 358 359 359 #define MIGRATION_DESCRIPTOR_DWORDS (GUC_KLV_LEN_MIN + MIGRATION_KLV_DEVICE_DEVID_LEN + \ 360 360 GUC_KLV_LEN_MIN + MIGRATION_KLV_DEVICE_REVID_LEN) 361 - static size_t pf_descriptor_init(struct xe_device *xe, unsigned int vfid) 361 + static int pf_descriptor_init(struct xe_device *xe, unsigned int vfid) 362 362 { 363 363 struct xe_sriov_packet **desc = pf_pick_descriptor(xe, vfid); 364 364 struct xe_sriov_packet *data;
+4
drivers/gpu/drm/xe/xe_sriov_pf.c
··· 90 90 */ 91 91 int xe_sriov_pf_init_early(struct xe_device *xe) 92 92 { 93 + struct xe_mert *mert = &xe_device_get_root_tile(xe)->mert; 93 94 int err; 94 95 95 96 xe_assert(xe, IS_SRIOV_PF(xe)); ··· 111 110 xe_guard_init(&xe->sriov.pf.guard_vfs_enabling, "vfs_enabling"); 112 111 113 112 xe_sriov_pf_service_init(xe); 113 + 114 + spin_lock_init(&mert->lock); 115 + init_completion(&mert->tlb_inv_done); 114 116 115 117 return 0; 116 118 }
+2 -4
drivers/gpu/drm/xe/xe_sriov_pf_debugfs.c
··· 70 70 if (ret < 0) 71 71 return ret; 72 72 if (yes) { 73 - xe_pm_runtime_get(xe); 73 + guard(xe_pm_runtime)(xe); 74 74 ret = call(xe); 75 - xe_pm_runtime_put(xe); 76 75 } 77 76 if (ret < 0) 78 77 return ret; ··· 208 209 if (ret < 0) 209 210 return ret; 210 211 if (yes) { 211 - xe_pm_runtime_get(xe); 212 + guard(xe_pm_runtime)(xe); 212 213 ret = call(xe, vfid); 213 - xe_pm_runtime_put(xe); 214 214 } 215 215 if (ret < 0) 216 216 return ret;
+4 -12
drivers/gpu/drm/xe/xe_sriov_pf_sysfs.c
··· 389 389 struct xe_sriov_dev_attr *vattr = to_xe_sriov_dev_attr(attr); 390 390 struct xe_sriov_kobj *vkobj = to_xe_sriov_kobj(kobj); 391 391 struct xe_device *xe = vkobj->xe; 392 - ssize_t ret; 393 392 394 393 if (!vattr->store) 395 394 return -EPERM; 396 395 397 - xe_pm_runtime_get(xe); 398 - ret = xe_sriov_pf_wait_ready(xe) ?: vattr->store(xe, buf, count); 399 - xe_pm_runtime_put(xe); 400 - 401 - return ret; 396 + guard(xe_pm_runtime)(xe); 397 + return xe_sriov_pf_wait_ready(xe) ?: vattr->store(xe, buf, count); 402 398 } 403 399 404 400 static ssize_t xe_sriov_vf_attr_show(struct kobject *kobj, struct attribute *attr, char *buf) ··· 419 423 struct xe_sriov_kobj *vkobj = to_xe_sriov_kobj(kobj); 420 424 struct xe_device *xe = vkobj->xe; 421 425 unsigned int vfid = vkobj->vfid; 422 - ssize_t ret; 423 426 424 427 xe_sriov_pf_assert_vfid(xe, vfid); 425 428 426 429 if (!vattr->store) 427 430 return -EPERM; 428 431 429 - xe_pm_runtime_get(xe); 430 - ret = xe_sriov_pf_wait_ready(xe) ?: vattr->store(xe, vfid, buf, count); 431 - xe_pm_runtime_get(xe); 432 - 433 - return ret; 432 + guard(xe_pm_runtime)(xe); 433 + return xe_sriov_pf_wait_ready(xe) ?: vattr->store(xe, vfid, buf, count); 434 434 } 435 435 436 436 static const struct sysfs_ops xe_sriov_dev_sysfs_ops = {
+80 -4
drivers/gpu/drm/xe/xe_sriov_vf.c
··· 49 49 * 50 50 * As soon as Virtual GPU of the VM starts, the VF driver within receives 51 51 * the MIGRATED interrupt and schedules post-migration recovery worker. 52 - * That worker queries GuC for new provisioning (using MMIO communication), 52 + * That worker sends `VF2GUC_RESFIX_START` action along with non-zero 53 + * marker, queries GuC for new provisioning (using MMIO communication), 53 54 * and applies fixups to any non-virtualized resources used by the VF. 54 55 * 55 56 * When the VF driver is ready to continue operation on the newly connected 56 - * hardware, it sends `VF2GUC_NOTIFY_RESFIX_DONE` which causes it to 57 + * hardware, it sends `VF2GUC_RESFIX_DONE` action along with the same 58 + * marker which was sent with `VF2GUC_RESFIX_START` which causes it to 57 59 * enter the long awaited `VF_RUNNING` state, and therefore start handling 58 60 * CTB messages and scheduling workloads from the VF:: 59 61 * ··· 104 102 * | [ ] new VF provisioning [ ] 105 103 * | [ ]---------------------------> [ ] 106 104 * | | [ ] 105 + * | | VF2GUC_RESFIX_START [ ] 106 + * | [ ] <---------------------------[ ] 107 + * | [ ] [ ] 108 + * | [ ] success [ ] 109 + * | [ ]---------------------------> [ ] 107 110 * | | VF driver applies post [ ] 108 111 * | | migration fixups -------[ ] 109 112 * | | | [ ] 110 113 * | | -----> [ ] 111 114 * | | [ ] 112 - * | | VF2GUC_NOTIFY_RESFIX_DONE [ ] 115 + * | | VF2GUC_RESFIX_DONE [ ] 113 116 * | [ ] <---------------------------[ ] 114 117 * | [ ] [ ] 115 118 * | [ ] GuC sets new VF state to [ ] ··· 125 118 * | [ ]---------------------------> [ ] 126 119 * | | | 127 120 * | | | 121 + * 122 + * Handling of VF double migration flow is shown below:: 123 + * 124 + * GuC1 VF 125 + * | | 126 + * | [ ]<--- start fixups 127 + * | VF2GUC_RESFIX_START(marker) [ ] 128 + * [ ] <-------------------------------------------[ ] 129 + * [ ] [ ] 130 + * [ ]---\ [ ] 131 + * [ ] store marker [ ] 132 + * [ ]<--/ [ ] 133 + * [ ] [ ] 134 + * [ ] success [ ] 135 + * [ ] ------------------------------------------> [ ] 136 + * | [ ] 137 + * | [ ]---\ 138 + * | [ ] do fixups 139 + * | [ ]<--/ 140 + * | [ ] 141 + * -------------- VF paused / saved ---------------- 142 + * : 143 + * 144 + * GuC2 145 + * | 146 + * ----------------- VF restored ------------------ 147 + * | 148 + * [ ] 149 + * [ ]---\ 150 + * [ ] reset marker 151 + * [ ]<--/ 152 + * [ ] 153 + * ----------------- VF resumed ------------------ 154 + * | [ ] 155 + * | [ ] 156 + * | VF2GUC_RESFIX_DONE(marker) [ ] 157 + * [ ] <-------------------------------------------[ ] 158 + * [ ] [ ] 159 + * [ ]---\ [ ] 160 + * [ ] check marker [ ] 161 + * [ ] (mismatch) [ ] 162 + * [ ]<--/ [ ] 163 + * [ ] [ ] 164 + * [ ] RESPONSE_VF_MIGRATED [ ] 165 + * [ ] ------------------------------------------> [ ] 166 + * | [ ]---\ 167 + * | [ ] reschedule fixups 168 + * | [ ]<--/ 169 + * | | 128 170 */ 129 171 130 172 /** ··· 226 170 vf_migration_init_early(xe); 227 171 } 228 172 173 + static int vf_migration_init_late(struct xe_device *xe) 174 + { 175 + struct xe_gt *gt = xe_root_mmio_gt(xe); 176 + struct xe_uc_fw_version guc_version; 177 + 178 + if (!xe_sriov_vf_migration_supported(xe)) 179 + return 0; 180 + 181 + xe_gt_sriov_vf_guc_versions(gt, NULL, &guc_version); 182 + if (MAKE_GUC_VER_STRUCT(guc_version) < MAKE_GUC_VER(1, 27, 0)) { 183 + xe_sriov_vf_migration_disable(xe, 184 + "requires GuC ABI >= 1.27.0, but only %u.%u.%u found", 185 + guc_version.major, guc_version.minor, 186 + guc_version.patch); 187 + return 0; 188 + } 189 + 190 + return xe_sriov_vf_ccs_init(xe); 191 + } 192 + 229 193 /** 230 194 * xe_sriov_vf_init_late() - SR-IOV VF late initialization functions. 231 195 * @xe: the &xe_device to initialize ··· 256 180 */ 257 181 int xe_sriov_vf_init_late(struct xe_device *xe) 258 182 { 259 - return xe_sriov_vf_ccs_init(xe); 183 + return vf_migration_init_late(xe); 260 184 } 261 185 262 186 static int sa_info_vf_ccs(struct seq_file *m, void *data)
+19 -8
drivers/gpu/drm/xe/xe_sriov_vf_ccs.c
··· 150 150 xe_sriov_info(xe, "Allocating %s CCS BB pool size = %lldMB\n", 151 151 ctx->ctx_id ? "Restore" : "Save", bb_pool_size / SZ_1M); 152 152 153 - sa_manager = xe_sa_bo_manager_init(tile, bb_pool_size, SZ_16); 153 + sa_manager = __xe_sa_bo_manager_init(tile, bb_pool_size, SZ_4K, SZ_16, 154 + XE_SA_BO_MANAGER_FLAG_SHADOW); 154 155 155 156 if (IS_ERR(sa_manager)) { 156 157 xe_sriov_err(xe, "Suballocator init failed with error: %pe\n", ··· 163 162 offset = 0; 164 163 xe_map_memset(xe, &sa_manager->bo->vmap, offset, MI_NOOP, 165 164 bb_pool_size); 165 + xe_map_memset(xe, &sa_manager->shadow->vmap, offset, MI_NOOP, 166 + bb_pool_size); 166 167 167 168 offset = bb_pool_size - sizeof(u32); 168 169 xe_map_wr(xe, &sa_manager->bo->vmap, offset, u32, MI_BATCH_BUFFER_END); 170 + xe_map_wr(xe, &sa_manager->shadow->vmap, offset, u32, MI_BATCH_BUFFER_END); 169 171 170 172 ctx->mem.ccs_bb_pool = sa_manager; 171 173 ··· 385 381 return err; 386 382 } 387 383 384 + #define XE_SRIOV_VF_CCS_RW_BB_ADDR_OFFSET (2 * sizeof(u32)) 385 + void xe_sriov_vf_ccs_rw_update_bb_addr(struct xe_sriov_vf_ccs_ctx *ctx) 386 + { 387 + u64 addr = xe_sa_manager_gpu_addr(ctx->mem.ccs_bb_pool); 388 + struct xe_lrc *lrc = xe_exec_queue_lrc(ctx->mig_q); 389 + struct xe_device *xe = gt_to_xe(ctx->mig_q->gt); 390 + 391 + xe_device_wmb(xe); 392 + xe_map_wr(xe, &lrc->bo->vmap, XE_SRIOV_VF_CCS_RW_BB_ADDR_OFFSET, u32, addr); 393 + xe_device_wmb(xe); 394 + } 395 + 388 396 /** 389 397 * xe_sriov_vf_ccs_attach_bo - Insert CCS read write commands in the BO. 390 398 * @bo: the &buffer object to which batch buffer commands will be added. ··· 457 441 if (!bb) 458 442 continue; 459 443 460 - memset(bb->cs, MI_NOOP, bb->len * sizeof(u32)); 461 - xe_bb_free(bb, NULL); 462 - bo->bb_ccs[ctx_id] = NULL; 444 + xe_migrate_ccs_rw_copy_clear(bo, ctx_id); 463 445 } 464 446 return 0; 465 447 } ··· 477 463 if (!IS_VF_CCS_READY(xe)) 478 464 return; 479 465 480 - xe_pm_runtime_get(xe); 481 - 466 + guard(xe_pm_runtime)(xe); 482 467 for_each_ccs_rw_ctx(ctx_id) { 483 468 bb_pool = xe->sriov.vf.ccs.contexts[ctx_id].mem.ccs_bb_pool; 484 469 if (!bb_pool) ··· 488 475 drm_suballoc_dump_debug_info(&bb_pool->base, p, xe_sa_manager_gpu_addr(bb_pool)); 489 476 drm_puts(p, "\n"); 490 477 } 491 - 492 - xe_pm_runtime_put(xe); 493 478 }
+1
drivers/gpu/drm/xe/xe_sriov_vf_ccs.h
··· 20 20 int xe_sriov_vf_ccs_register_context(struct xe_device *xe); 21 21 void xe_sriov_vf_ccs_rebase(struct xe_device *xe); 22 22 void xe_sriov_vf_ccs_print(struct xe_device *xe, struct drm_printer *p); 23 + void xe_sriov_vf_ccs_rw_update_bb_addr(struct xe_sriov_vf_ccs_ctx *ctx); 23 24 24 25 static inline bool xe_sriov_vf_ccs_ready(struct xe_device *xe) 25 26 {
+183 -96
drivers/gpu/drm/xe/xe_survivability_mode.c
··· 16 16 #include "xe_heci_gsc.h" 17 17 #include "xe_i2c.h" 18 18 #include "xe_mmio.h" 19 + #include "xe_nvm.h" 19 20 #include "xe_pcode_api.h" 20 21 #include "xe_vsec.h" 21 - 22 - #define MAX_SCRATCH_MMIO 8 23 22 24 23 /** 25 24 * DOC: Survivability Mode ··· 47 48 * 48 49 * Refer :ref:`xe_configfs` for more details on how to use configfs 49 50 * 50 - * Survivability mode is indicated by the below admin-only readable sysfs which provides additional 51 - * debug information:: 51 + * Survivability mode is indicated by the below admin-only readable sysfs entry. It 52 + * provides information about the type of survivability mode (Boot/Runtime). 52 53 * 53 - * /sys/bus/pci/devices/<device>/survivability_mode 54 + * .. code-block:: shell 54 55 * 55 - * Capability Information: 56 - * Provides boot status 57 - * Postcode Information: 58 - * Provides information about the failure 59 - * Overflow Information 60 - * Provides history of previous failures 61 - * Auxiliary Information 62 - * Certain failures may have information in addition to postcode information 56 + * # cat /sys/bus/pci/devices/<device>/survivability_mode 57 + * Boot 58 + * 59 + * 60 + * Any additional debug information if present will be visible under the directory 61 + * ``survivability_info``:: 62 + * 63 + * /sys/bus/pci/devices/<device>/survivability_info/ 64 + * ├── aux_info0 65 + * ├── aux_info1 66 + * ├── aux_info2 67 + * ├── aux_info3 68 + * ├── aux_info4 69 + * ├── capability_info 70 + * ├── fdo_mode 71 + * ├── postcode_trace 72 + * └── postcode_trace_overflow 73 + * 74 + * This directory has the following attributes 75 + * 76 + * - ``capability_info`` : Indicates Boot status and support for additional information 77 + * 78 + * - ``postcode_trace``, ``postcode_trace_overflow`` : Each postcode is a 8bit value and 79 + * represents a boot failure event. When a new failure event is logged by PCODE the 80 + * existing postcodes are shifted left. These entries provide a history of 8 postcodes. 81 + * 82 + * - ``aux_info<n>`` : Some failures have additional debug information 83 + * 84 + * - ``fdo_mode`` : To allow recovery in scenarios where MEI itself fails, a new SPI Flash 85 + * Descriptor Override (FDO) mode is added in v2 survivability breadcrumbs. This mode is enabled 86 + * by PCODE and provides the ability to directly update the firmware via SPI Driver without 87 + * any dependency on MEI. Xe KMD initializes the nvm aux driver if FDO mode is enabled. 63 88 * 64 89 * Runtime Survivability 65 90 * ===================== ··· 91 68 * Certain runtime firmware errors can cause the device to enter a wedged state 92 69 * (:ref:`xe-device-wedging`) requiring a firmware flash to restore normal operation. 93 70 * Runtime Survivability Mode indicates that a firmware flash is necessary to recover the device and 94 - * is indicated by the presence of survivability mode sysfs:: 95 - * 96 - * /sys/bus/pci/devices/<device>/survivability_mode 97 - * 71 + * is indicated by the presence of survivability mode sysfs. 98 72 * Survivability mode sysfs provides information about the type of survivability mode. 73 + * 74 + * .. code-block:: shell 75 + * 76 + * # cat /sys/bus/pci/devices/<device>/survivability_mode 77 + * Runtime 99 78 * 100 79 * When such errors occur, userspace is notified with the drm device wedged uevent and runtime 101 80 * survivability mode. User can then initiate a firmware flash using userspace tools like fwupd 102 81 * to restore device to normal operation. 103 82 */ 104 83 105 - static u32 aux_history_offset(u32 reg_value) 84 + static const char * const reg_map[] = { 85 + [CAPABILITY_INFO] = "Capability Info", 86 + [POSTCODE_TRACE] = "Postcode trace", 87 + [POSTCODE_TRACE_OVERFLOW] = "Postcode trace overflow", 88 + [AUX_INFO0] = "Auxiliary Info 0", 89 + [AUX_INFO1] = "Auxiliary Info 1", 90 + [AUX_INFO2] = "Auxiliary Info 2", 91 + [AUX_INFO3] = "Auxiliary Info 3", 92 + [AUX_INFO4] = "Auxiliary Info 4", 93 + }; 94 + 95 + #define FDO_INFO (MAX_SCRATCH_REG + 1) 96 + 97 + struct xe_survivability_attribute { 98 + struct device_attribute attr; 99 + u8 index; 100 + }; 101 + 102 + static struct 103 + xe_survivability_attribute *dev_attr_to_survivability_attr(struct device_attribute *attr) 106 104 { 107 - return REG_FIELD_GET(AUXINFO_HISTORY_OFFSET, reg_value); 105 + return container_of(attr, struct xe_survivability_attribute, attr); 108 106 } 109 107 110 - static void set_survivability_info(struct xe_mmio *mmio, struct xe_survivability_info *info, 111 - int id, char *name) 108 + static void set_survivability_info(struct xe_mmio *mmio, u32 *info, int id) 112 109 { 113 - strscpy(info[id].name, name, sizeof(info[id].name)); 114 - info[id].reg = PCODE_SCRATCH(id).raw; 115 - info[id].value = xe_mmio_read32(mmio, PCODE_SCRATCH(id)); 110 + info[id] = xe_mmio_read32(mmio, PCODE_SCRATCH(id)); 116 111 } 117 112 118 113 static void populate_survivability_info(struct xe_device *xe) 119 114 { 120 115 struct xe_survivability *survivability = &xe->survivability; 121 - struct xe_survivability_info *info = survivability->info; 116 + u32 *info = survivability->info; 122 117 struct xe_mmio *mmio; 123 118 u32 id = 0, reg_value; 124 - char name[NAME_MAX]; 125 - int index; 126 119 127 120 mmio = xe_root_tile_mmio(xe); 128 - set_survivability_info(mmio, info, id, "Capability Info"); 129 - reg_value = info[id].value; 121 + set_survivability_info(mmio, info, CAPABILITY_INFO); 122 + reg_value = info[CAPABILITY_INFO]; 123 + 124 + survivability->version = REG_FIELD_GET(BREADCRUMB_VERSION, reg_value); 125 + /* FDO mode is exposed only from version 2 */ 126 + if (survivability->version >= 2) 127 + survivability->fdo_mode = REG_FIELD_GET(FDO_MODE, reg_value); 130 128 131 129 if (reg_value & HISTORY_TRACKING) { 132 - id++; 133 - set_survivability_info(mmio, info, id, "Postcode Info"); 130 + set_survivability_info(mmio, info, POSTCODE_TRACE); 134 131 135 - if (reg_value & OVERFLOW_SUPPORT) { 136 - id = REG_FIELD_GET(OVERFLOW_REG_OFFSET, reg_value); 137 - set_survivability_info(mmio, info, id, "Overflow Info"); 138 - } 132 + if (reg_value & OVERFLOW_SUPPORT) 133 + set_survivability_info(mmio, info, POSTCODE_TRACE_OVERFLOW); 139 134 } 140 135 136 + /* Traverse the linked list of aux info registers */ 141 137 if (reg_value & AUXINFO_SUPPORT) { 142 - id = REG_FIELD_GET(AUXINFO_REG_OFFSET, reg_value); 143 - 144 - for (index = 0; id && reg_value; index++, reg_value = info[id].value, 145 - id = aux_history_offset(reg_value)) { 146 - snprintf(name, NAME_MAX, "Auxiliary Info %d", index); 147 - set_survivability_info(mmio, info, id, name); 148 - } 138 + for (id = REG_FIELD_GET(AUXINFO_REG_OFFSET, reg_value); 139 + id >= AUX_INFO0 && id < MAX_SCRATCH_REG; 140 + id = REG_FIELD_GET(AUXINFO_HISTORY_OFFSET, info[id])) 141 + set_survivability_info(mmio, info, id); 149 142 } 150 143 } 151 144 ··· 169 130 { 170 131 struct xe_device *xe = pdev_to_xe_device(pdev); 171 132 struct xe_survivability *survivability = &xe->survivability; 172 - struct xe_survivability_info *info = survivability->info; 133 + u32 *info = survivability->info; 173 134 int id; 174 135 175 136 dev_info(&pdev->dev, "Survivability Boot Status : Critical Failure (%d)\n", 176 137 survivability->boot_status); 177 - for (id = 0; id < MAX_SCRATCH_MMIO; id++) { 178 - if (info[id].reg) 179 - dev_info(&pdev->dev, "%s: 0x%x - 0x%x\n", info[id].name, 180 - info[id].reg, info[id].value); 138 + for (id = 0; id < MAX_SCRATCH_REG; id++) { 139 + if (info[id]) 140 + dev_info(&pdev->dev, "%s: 0x%x\n", reg_map[id], info[id]); 181 141 } 182 142 } 183 143 ··· 194 156 struct pci_dev *pdev = to_pci_dev(dev); 195 157 struct xe_device *xe = pdev_to_xe_device(pdev); 196 158 struct xe_survivability *survivability = &xe->survivability; 197 - struct xe_survivability_info *info = survivability->info; 198 - int index = 0, count = 0; 199 159 200 - count += sysfs_emit_at(buff, count, "Survivability mode type: %s\n", 201 - survivability->type ? "Runtime" : "Boot"); 202 - 203 - if (!check_boot_failure(xe)) 204 - return count; 205 - 206 - for (index = 0; index < MAX_SCRATCH_MMIO; index++) { 207 - if (info[index].reg) 208 - count += sysfs_emit_at(buff, count, "%s: 0x%x - 0x%x\n", info[index].name, 209 - info[index].reg, info[index].value); 210 - } 211 - 212 - return count; 160 + return sysfs_emit(buff, "%s\n", survivability->type ? "Runtime" : "Boot"); 213 161 } 214 162 215 163 static DEVICE_ATTR_ADMIN_RO(survivability_mode); 216 164 165 + static ssize_t survivability_info_show(struct device *dev, 166 + struct device_attribute *attr, char *buff) 167 + { 168 + struct xe_survivability_attribute *sa = dev_attr_to_survivability_attr(attr); 169 + struct pci_dev *pdev = to_pci_dev(dev); 170 + struct xe_device *xe = pdev_to_xe_device(pdev); 171 + struct xe_survivability *survivability = &xe->survivability; 172 + u32 *info = survivability->info; 173 + 174 + if (sa->index == FDO_INFO) 175 + return sysfs_emit(buff, "%s\n", str_enabled_disabled(survivability->fdo_mode)); 176 + 177 + return sysfs_emit(buff, "0x%x\n", info[sa->index]); 178 + } 179 + 180 + #define SURVIVABILITY_ATTR_RO(name, _index) \ 181 + struct xe_survivability_attribute attr_##name = { \ 182 + .attr = __ATTR(name, 0400, survivability_info_show, NULL), \ 183 + .index = _index, \ 184 + } 185 + 186 + static SURVIVABILITY_ATTR_RO(capability_info, CAPABILITY_INFO); 187 + static SURVIVABILITY_ATTR_RO(postcode_trace, POSTCODE_TRACE); 188 + static SURVIVABILITY_ATTR_RO(postcode_trace_overflow, POSTCODE_TRACE_OVERFLOW); 189 + static SURVIVABILITY_ATTR_RO(aux_info0, AUX_INFO0); 190 + static SURVIVABILITY_ATTR_RO(aux_info1, AUX_INFO1); 191 + static SURVIVABILITY_ATTR_RO(aux_info2, AUX_INFO2); 192 + static SURVIVABILITY_ATTR_RO(aux_info3, AUX_INFO3); 193 + static SURVIVABILITY_ATTR_RO(aux_info4, AUX_INFO4); 194 + static SURVIVABILITY_ATTR_RO(fdo_mode, FDO_INFO); 195 + 217 196 static void xe_survivability_mode_fini(void *arg) 218 197 { 219 198 struct xe_device *xe = arg; 199 + struct xe_survivability *survivability = &xe->survivability; 220 200 struct pci_dev *pdev = to_pci_dev(xe->drm.dev); 221 201 struct device *dev = &pdev->dev; 222 202 223 - sysfs_remove_file(&dev->kobj, &dev_attr_survivability_mode.attr); 203 + if (survivability->fdo_mode) 204 + xe_nvm_fini(xe); 205 + 206 + device_remove_file(dev, &dev_attr_survivability_mode); 224 207 } 208 + 209 + static umode_t survivability_info_attrs_visible(struct kobject *kobj, struct attribute *attr, 210 + int idx) 211 + { 212 + struct xe_device *xe = kdev_to_xe_device(kobj_to_dev(kobj)); 213 + struct xe_survivability *survivability = &xe->survivability; 214 + u32 *info = survivability->info; 215 + 216 + /* 217 + * Last index in survivability_info_attrs is fdo mode and is applicable only in 218 + * version 2 of survivability mode 219 + */ 220 + if (idx == MAX_SCRATCH_REG && survivability->version >= 2) 221 + return 0400; 222 + 223 + if (idx < MAX_SCRATCH_REG && info[idx]) 224 + return 0400; 225 + 226 + return 0; 227 + } 228 + 229 + /* Attributes are ordered according to enum scratch_reg */ 230 + static struct attribute *survivability_info_attrs[] = { 231 + &attr_capability_info.attr.attr, 232 + &attr_postcode_trace.attr.attr, 233 + &attr_postcode_trace_overflow.attr.attr, 234 + &attr_aux_info0.attr.attr, 235 + &attr_aux_info1.attr.attr, 236 + &attr_aux_info2.attr.attr, 237 + &attr_aux_info3.attr.attr, 238 + &attr_aux_info4.attr.attr, 239 + &attr_fdo_mode.attr.attr, 240 + NULL, 241 + }; 242 + 243 + static const struct attribute_group survivability_info_group = { 244 + .name = "survivability_info", 245 + .attrs = survivability_info_attrs, 246 + .is_visible = survivability_info_attrs_visible, 247 + }; 225 248 226 249 static int create_survivability_sysfs(struct pci_dev *pdev) 227 250 { ··· 290 191 struct xe_device *xe = pdev_to_xe_device(pdev); 291 192 int ret; 292 193 293 - /* create survivability mode sysfs */ 294 - ret = sysfs_create_file(&dev->kobj, &dev_attr_survivability_mode.attr); 194 + ret = device_create_file(dev, &dev_attr_survivability_mode); 295 195 if (ret) { 296 196 dev_warn(dev, "Failed to create survivability sysfs files\n"); 297 197 return ret; ··· 300 202 xe_survivability_mode_fini, xe); 301 203 if (ret) 302 204 return ret; 205 + 206 + if (check_boot_failure(xe)) { 207 + ret = devm_device_add_group(dev, &survivability_info_group); 208 + if (ret) 209 + return ret; 210 + } 303 211 304 212 return 0; 305 213 } ··· 324 220 /* Make sure xe_heci_gsc_init() knows about survivability mode */ 325 221 survivability->mode = true; 326 222 327 - ret = xe_heci_gsc_init(xe); 328 - if (ret) 329 - goto err; 223 + xe_heci_gsc_init(xe); 330 224 331 225 xe_vsec_init(xe); 226 + 227 + if (survivability->fdo_mode) { 228 + ret = xe_nvm_init(xe); 229 + if (ret) 230 + goto err; 231 + } 332 232 333 233 ret = xe_i2c_probe(xe); 334 234 if (ret) ··· 343 235 return 0; 344 236 345 237 err: 238 + dev_err(dev, "Failed to enable Survivability Mode\n"); 346 239 survivability->mode = false; 347 240 return ret; 348 - } 349 - 350 - static int init_survivability_mode(struct xe_device *xe) 351 - { 352 - struct xe_survivability *survivability = &xe->survivability; 353 - struct xe_survivability_info *info; 354 - 355 - survivability->size = MAX_SCRATCH_MMIO; 356 - 357 - info = devm_kcalloc(xe->drm.dev, survivability->size, sizeof(*info), 358 - GFP_KERNEL); 359 - if (!info) 360 - return -ENOMEM; 361 - 362 - survivability->info = info; 363 - 364 - populate_survivability_info(xe); 365 - 366 - return 0; 367 241 } 368 242 369 243 /** ··· 415 325 return -EINVAL; 416 326 } 417 327 418 - ret = init_survivability_mode(xe); 419 - if (ret) 420 - return ret; 328 + populate_survivability_info(xe); 421 329 422 330 ret = create_survivability_sysfs(pdev); 423 331 if (ret) ··· 444 356 { 445 357 struct xe_survivability *survivability = &xe->survivability; 446 358 struct pci_dev *pdev = to_pci_dev(xe->drm.dev); 447 - int ret; 448 359 449 360 if (!xe_survivability_mode_is_requested(xe)) 450 361 return 0; 451 362 452 - ret = init_survivability_mode(xe); 453 - if (ret) 454 - return ret; 363 + populate_survivability_info(xe); 455 364 456 - /* Log breadcrumbs but do not enter survivability mode for Critical boot errors */ 457 - if (survivability->boot_status == CRITICAL_FAILURE) { 365 + /* 366 + * v2 supports survivability mode for critical errors 367 + */ 368 + if (survivability->version < 2 && survivability->boot_status == CRITICAL_FAILURE) { 458 369 log_survivability_info(pdev); 459 370 return -ENXIO; 460 371 }
+20 -8
drivers/gpu/drm/xe/xe_survivability_mode_types.h
··· 9 9 #include <linux/limits.h> 10 10 #include <linux/types.h> 11 11 12 + enum scratch_reg { 13 + CAPABILITY_INFO, 14 + POSTCODE_TRACE, 15 + POSTCODE_TRACE_OVERFLOW, 16 + AUX_INFO0, 17 + AUX_INFO1, 18 + AUX_INFO2, 19 + AUX_INFO3, 20 + AUX_INFO4, 21 + MAX_SCRATCH_REG, 22 + }; 23 + 12 24 enum xe_survivability_type { 13 25 XE_SURVIVABILITY_TYPE_BOOT, 14 26 XE_SURVIVABILITY_TYPE_RUNTIME, 15 - }; 16 - 17 - struct xe_survivability_info { 18 - char name[NAME_MAX]; 19 - u32 reg; 20 - u32 value; 21 27 }; 22 28 23 29 /** 24 30 * struct xe_survivability: Contains survivability mode information 25 31 */ 26 32 struct xe_survivability { 27 - /** @info: struct that holds survivability info from scratch registers */ 28 - struct xe_survivability_info *info; 33 + /** @info: survivability debug info */ 34 + u32 info[MAX_SCRATCH_REG]; 29 35 30 36 /** @size: number of scratch registers */ 31 37 u32 size; ··· 44 38 45 39 /** @type: survivability type */ 46 40 enum xe_survivability_type type; 41 + 42 + /** @fdo_mode: indicates if FDO mode is enabled */ 43 + bool fdo_mode; 44 + 45 + /** @version: breadcrumb version of survivability mode */ 46 + u8 version; 47 47 }; 48 48 49 49 #endif /* _XE_SURVIVABILITY_MODE_TYPES_H_ */
+42 -46
drivers/gpu/drm/xe/xe_svm.c
··· 176 176 mmu_range); 177 177 } 178 178 179 - static s64 xe_svm_stats_ktime_us_delta(ktime_t start) 180 - { 181 - return IS_ENABLED(CONFIG_DEBUG_FS) ? 182 - ktime_us_delta(ktime_get(), start) : 0; 183 - } 184 - 185 179 static void xe_svm_tlb_inval_us_stats_incr(struct xe_gt *gt, ktime_t start) 186 180 { 187 - s64 us_delta = xe_svm_stats_ktime_us_delta(start); 181 + s64 us_delta = xe_gt_stats_ktime_us_delta(start); 188 182 189 183 xe_gt_stats_incr(gt, XE_GT_STATS_ID_SVM_TLB_INVAL_US, us_delta); 190 - } 191 - 192 - static ktime_t xe_svm_stats_ktime_get(void) 193 - { 194 - return IS_ENABLED(CONFIG_DEBUG_FS) ? ktime_get() : 0; 195 184 } 196 185 197 186 static void xe_svm_invalidate(struct drm_gpusvm *gpusvm, ··· 191 202 struct xe_device *xe = vm->xe; 192 203 struct drm_gpusvm_range *r, *first; 193 204 struct xe_tile *tile; 194 - ktime_t start = xe_svm_stats_ktime_get(); 205 + ktime_t start = xe_gt_stats_ktime_get(); 195 206 u64 adj_start = mmu_range->start, adj_end = mmu_range->end; 196 207 u8 tile_mask = 0, id; 197 208 long err; ··· 274 285 return 0; 275 286 } 276 287 277 - static int xe_svm_range_set_default_attr(struct xe_vm *vm, u64 range_start, u64 range_end) 288 + static void xe_vma_set_default_attributes(struct xe_vma *vma) 289 + { 290 + vma->attr.preferred_loc.devmem_fd = DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE; 291 + vma->attr.preferred_loc.migration_policy = DRM_XE_MIGRATE_ALL_PAGES; 292 + vma->attr.pat_index = vma->attr.default_pat_index; 293 + vma->attr.atomic_access = DRM_XE_ATOMIC_UNDEFINED; 294 + } 295 + 296 + static int xe_svm_range_set_default_attr(struct xe_vm *vm, u64 start, u64 end) 278 297 { 279 298 struct xe_vma *vma; 280 - struct xe_vma_mem_attr default_attr = { 281 - .preferred_loc = { 282 - .devmem_fd = DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE, 283 - .migration_policy = DRM_XE_MIGRATE_ALL_PAGES, 284 - }, 285 - .atomic_access = DRM_XE_ATOMIC_UNDEFINED, 286 - }; 287 - int err = 0; 299 + bool has_default_attr; 300 + int err; 288 301 289 - vma = xe_vm_find_vma_by_addr(vm, range_start); 302 + vma = xe_vm_find_vma_by_addr(vm, start); 290 303 if (!vma) 291 304 return -EINVAL; 292 305 ··· 297 306 return 0; 298 307 } 299 308 300 - if (xe_vma_has_default_mem_attrs(vma)) 301 - return 0; 302 - 303 309 vm_dbg(&vm->xe->drm, "Existing VMA start=0x%016llx, vma_end=0x%016llx", 304 310 xe_vma_start(vma), xe_vma_end(vma)); 305 311 306 - if (xe_vma_start(vma) == range_start && xe_vma_end(vma) == range_end) { 307 - default_attr.pat_index = vma->attr.default_pat_index; 308 - default_attr.default_pat_index = vma->attr.default_pat_index; 309 - vma->attr = default_attr; 310 - } else { 311 - vm_dbg(&vm->xe->drm, "Split VMA start=0x%016llx, vma_end=0x%016llx", 312 - range_start, range_end); 313 - err = xe_vm_alloc_cpu_addr_mirror_vma(vm, range_start, range_end - range_start); 314 - if (err) { 315 - drm_warn(&vm->xe->drm, "VMA SPLIT failed: %pe\n", ERR_PTR(err)); 316 - xe_vm_kill(vm, true); 317 - return err; 318 - } 312 + has_default_attr = xe_vma_has_default_mem_attrs(vma); 313 + 314 + if (has_default_attr) { 315 + start = xe_vma_start(vma); 316 + end = xe_vma_end(vma); 317 + } else if (xe_vma_start(vma) == start && xe_vma_end(vma) == end) { 318 + xe_vma_set_default_attributes(vma); 319 + } 320 + 321 + xe_vm_find_cpu_addr_mirror_vma_range(vm, &start, &end); 322 + 323 + if (xe_vma_start(vma) == start && xe_vma_end(vma) == end && has_default_attr) 324 + return 0; 325 + 326 + vm_dbg(&vm->xe->drm, "New VMA start=0x%016llx, vma_end=0x%016llx", start, end); 327 + 328 + err = xe_vm_alloc_cpu_addr_mirror_vma(vm, start, end - start); 329 + if (err) { 330 + drm_warn(&vm->xe->drm, "New VMA MAP failed: %pe\n", ERR_PTR(err)); 331 + xe_vm_kill(vm, true); 332 + return err; 319 333 } 320 334 321 335 /* ··· 431 435 unsigned long npages, 432 436 ktime_t start) 433 437 { 434 - s64 us_delta = xe_svm_stats_ktime_us_delta(start); 438 + s64 us_delta = xe_gt_stats_ktime_us_delta(start); 435 439 436 440 if (dir == XE_SVM_COPY_TO_VRAM) { 437 441 switch (npages) { ··· 483 487 u64 vram_addr = XE_VRAM_ADDR_INVALID; 484 488 int err = 0, pos = 0; 485 489 bool sram = dir == XE_SVM_COPY_TO_SRAM; 486 - ktime_t start = xe_svm_stats_ktime_get(); 490 + ktime_t start = xe_gt_stats_ktime_get(); 487 491 488 492 /* 489 493 * This flow is complex: it locates physically contiguous device pages, ··· 975 979 struct xe_svm_range *range, \ 976 980 ktime_t start) \ 977 981 { \ 978 - s64 us_delta = xe_svm_stats_ktime_us_delta(start); \ 982 + s64 us_delta = xe_gt_stats_ktime_us_delta(start); \ 979 983 \ 980 984 switch (xe_svm_range_size(range)) { \ 981 985 case SZ_4K: \ ··· 1020 1024 struct drm_pagemap *dpagemap; 1021 1025 struct xe_tile *tile = gt_to_tile(gt); 1022 1026 int migrate_try_count = ctx.devmem_only ? 3 : 1; 1023 - ktime_t start = xe_svm_stats_ktime_get(), bind_start, get_pages_start; 1027 + ktime_t start = xe_gt_stats_ktime_get(), bind_start, get_pages_start; 1024 1028 int err; 1025 1029 1026 1030 lockdep_assert_held_write(&vm->lock); ··· 1059 1063 1060 1064 if (--migrate_try_count >= 0 && 1061 1065 xe_svm_range_needs_migrate_to_vram(range, vma, !!dpagemap || ctx.devmem_only)) { 1062 - ktime_t migrate_start = xe_svm_stats_ktime_get(); 1066 + ktime_t migrate_start = xe_gt_stats_ktime_get(); 1063 1067 1064 1068 /* TODO : For multi-device dpagemap will be used to find the 1065 1069 * remote tile and remote device. Will need to modify ··· 1096 1100 } 1097 1101 1098 1102 get_pages: 1099 - get_pages_start = xe_svm_stats_ktime_get(); 1103 + get_pages_start = xe_gt_stats_ktime_get(); 1100 1104 1101 1105 range_debug(range, "GET PAGES"); 1102 1106 err = xe_svm_range_get_pages(vm, range, &ctx); ··· 1123 1127 xe_svm_range_get_pages_us_stats_incr(gt, range, get_pages_start); 1124 1128 range_debug(range, "PAGE FAULT - BIND"); 1125 1129 1126 - bind_start = xe_svm_stats_ktime_get(); 1130 + bind_start = xe_gt_stats_ktime_get(); 1127 1131 xe_validation_guard(&vctx, &vm->xe->val, &exec, (struct xe_val_flags) {}, err) { 1128 1132 err = xe_vm_drm_exec_lock(vm, &exec); 1129 1133 drm_exec_retry_on_contention(&exec);
+31 -2
drivers/gpu/drm/xe/xe_sync.c
··· 228 228 return 0; 229 229 } 230 230 231 + /** 232 + * xe_sync_entry_wait() - Wait on in-sync 233 + * @sync: Sync object 234 + * 235 + * If the sync is in an in-sync, wait on the sync to signal. 236 + * 237 + * Return: 0 on success, -ERESTARTSYS on failure (interruption) 238 + */ 239 + int xe_sync_entry_wait(struct xe_sync_entry *sync) 240 + { 241 + return xe_sync_needs_wait(sync) ? 242 + dma_fence_wait(sync->fence, true) : 0; 243 + } 244 + 245 + /** 246 + * xe_sync_needs_wait() - Sync needs a wait (input dma-fence not signaled) 247 + * @sync: Sync object 248 + * 249 + * Return: True if sync needs a wait, False otherwise 250 + */ 251 + bool xe_sync_needs_wait(struct xe_sync_entry *sync) 252 + { 253 + return sync->fence && 254 + !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &sync->fence->flags); 255 + } 256 + 231 257 void xe_sync_entry_signal(struct xe_sync_entry *sync, struct dma_fence *fence) 232 258 { 233 259 if (!(sync->flags & DRM_XE_SYNC_FLAG_SIGNAL)) ··· 337 311 struct xe_tile *tile; 338 312 u8 id; 339 313 340 - for_each_tile(tile, vm->xe, id) 341 - num_fence += (1 + XE_MAX_GT_PER_TILE); 314 + for_each_tile(tile, vm->xe, id) { 315 + num_fence++; 316 + for_each_tlb_inval(i) 317 + num_fence++; 318 + } 342 319 343 320 fences = kmalloc_array(num_fence, sizeof(*fences), 344 321 GFP_KERNEL);
+2
drivers/gpu/drm/xe/xe_sync.h
··· 29 29 struct xe_sched_job *job); 30 30 void xe_sync_entry_signal(struct xe_sync_entry *sync, 31 31 struct dma_fence *fence); 32 + int xe_sync_entry_wait(struct xe_sync_entry *sync); 33 + bool xe_sync_needs_wait(struct xe_sync_entry *sync); 32 34 void xe_sync_entry_cleanup(struct xe_sync_entry *sync); 33 35 struct dma_fence * 34 36 xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
+5
drivers/gpu/drm/xe/xe_tile.c
··· 209 209 if (IS_ERR(tile->mem.kernel_bb_pool)) 210 210 return PTR_ERR(tile->mem.kernel_bb_pool); 211 211 212 + /* Optimistically anticipate at most 256 TLB fences with PRL */ 213 + tile->mem.reclaim_pool = xe_sa_bo_manager_init(tile, SZ_1M, XE_PAGE_RECLAIM_LIST_MAX_SIZE); 214 + if (IS_ERR(tile->mem.reclaim_pool)) 215 + return PTR_ERR(tile->mem.reclaim_pool); 216 + 212 217 return 0; 213 218 } 214 219 void xe_tile_migrate_wait(struct xe_tile *tile)
+11 -6
drivers/gpu/drm/xe/xe_tile_debugfs.c
··· 82 82 struct drm_info_node *node = m->private; 83 83 struct xe_tile *tile = node_to_tile(node); 84 84 struct xe_device *xe = tile_to_xe(tile); 85 - int ret; 86 85 87 - xe_pm_runtime_get(xe); 88 - ret = xe_tile_debugfs_simple_show(m, data); 89 - xe_pm_runtime_put(xe); 90 - 91 - return ret; 86 + guard(xe_pm_runtime)(xe); 87 + return xe_tile_debugfs_simple_show(m, data); 92 88 } 93 89 94 90 static int ggtt(struct xe_tile *tile, struct drm_printer *p) ··· 105 109 { "ggtt", .show = xe_tile_debugfs_show_with_rpm, .data = ggtt }, 106 110 { "sa_info", .show = xe_tile_debugfs_show_with_rpm, .data = sa_info }, 107 111 }; 112 + 113 + static void tile_debugfs_create_vram_mm(struct xe_tile *tile) 114 + { 115 + if (tile->mem.vram) 116 + ttm_resource_manager_create_debugfs(&tile->mem.vram->ttm.manager, tile->debugfs, 117 + "vram_mm"); 118 + } 108 119 109 120 /** 110 121 * xe_tile_debugfs_register - Register tile's debugfs attributes ··· 142 139 drm_debugfs_create_files(vf_safe_debugfs_list, 143 140 ARRAY_SIZE(vf_safe_debugfs_list), 144 141 tile->debugfs, minor); 142 + 143 + tile_debugfs_create_vram_mm(tile); 145 144 }
+1 -2
drivers/gpu/drm/xe/xe_tile_sriov_pf_debugfs.c
··· 141 141 if (val > (TYPE)~0ull) \ 142 142 return -EOVERFLOW; \ 143 143 \ 144 - xe_pm_runtime_get(xe); \ 144 + guard(xe_pm_runtime)(xe); \ 145 145 err = xe_sriov_pf_wait_ready(xe) ?: \ 146 146 xe_gt_sriov_pf_config_set_##CONFIG(gt, vfid, val); \ 147 147 if (!err) \ 148 148 xe_sriov_pf_provision_set_custom_mode(xe); \ 149 - xe_pm_runtime_put(xe); \ 150 149 \ 151 150 return err; \ 152 151 } \
+24 -3
drivers/gpu/drm/xe/xe_tlb_inval.c
··· 199 199 mutex_unlock(&tlb_inval->seqno_lock); 200 200 } 201 201 202 + /** 203 + * xe_tlb_inval_reset_timeout() - Reset TLB inval fence timeout 204 + * @tlb_inval: TLB invalidation client 205 + * 206 + * Reset the TLB invalidation timeout timer. 207 + */ 208 + static void xe_tlb_inval_reset_timeout(struct xe_tlb_inval *tlb_inval) 209 + { 210 + lockdep_assert_held(&tlb_inval->pending_lock); 211 + 212 + mod_delayed_work(system_wq, &tlb_inval->fence_tdr, 213 + tlb_inval->ops->timeout_delay(tlb_inval)); 214 + } 215 + 202 216 static bool xe_tlb_inval_seqno_past(struct xe_tlb_inval *tlb_inval, int seqno) 203 217 { 204 218 int seqno_recv = READ_ONCE(tlb_inval->seqno_recv); ··· 313 299 * @start: start address 314 300 * @end: end address 315 301 * @asid: address space id 302 + * @prl_sa: suballocation of page reclaim list if used, NULL indicates PPC flush 316 303 * 317 304 * Issue a range based TLB invalidation if supported, if not fallback to a full 318 305 * TLB invalidation. Completion of TLB is asynchronous and caller can use ··· 323 308 */ 324 309 int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval, 325 310 struct xe_tlb_inval_fence *fence, u64 start, u64 end, 326 - u32 asid) 311 + u32 asid, struct drm_suballoc *prl_sa) 327 312 { 328 313 return xe_tlb_inval_issue(tlb_inval, fence, tlb_inval->ops->ppgtt, 329 - start, end, asid); 314 + start, end, asid, prl_sa); 330 315 } 331 316 332 317 /** ··· 342 327 u64 range = 1ull << vm->xe->info.va_bits; 343 328 344 329 xe_tlb_inval_fence_init(tlb_inval, &fence, true); 345 - xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid); 330 + xe_tlb_inval_range(tlb_inval, &fence, 0, range, vm->usm.asid, NULL); 346 331 xe_tlb_inval_fence_wait(&fence); 347 332 } 348 333 ··· 375 360 * process_g2h_msg(). 376 361 */ 377 362 spin_lock_irqsave(&tlb_inval->pending_lock, flags); 363 + if (seqno == TLB_INVALIDATION_SEQNO_INVALID) { 364 + xe_tlb_inval_reset_timeout(tlb_inval); 365 + spin_unlock_irqrestore(&tlb_inval->pending_lock, flags); 366 + return; 367 + } 368 + 378 369 if (xe_tlb_inval_seqno_past(tlb_inval, seqno)) { 379 370 spin_unlock_irqrestore(&tlb_inval->pending_lock, flags); 380 371 return;
+1 -1
drivers/gpu/drm/xe/xe_tlb_inval.h
··· 23 23 void xe_tlb_inval_vm(struct xe_tlb_inval *tlb_inval, struct xe_vm *vm); 24 24 int xe_tlb_inval_range(struct xe_tlb_inval *tlb_inval, 25 25 struct xe_tlb_inval_fence *fence, 26 - u64 start, u64 end, u32 asid); 26 + u64 start, u64 end, u32 asid, struct drm_suballoc *prl_sa); 27 27 28 28 void xe_tlb_inval_fence_init(struct xe_tlb_inval *tlb_inval, 29 29 struct xe_tlb_inval_fence *fence,
+35 -1
drivers/gpu/drm/xe/xe_tlb_inval_job.c
··· 7 7 #include "xe_dep_job_types.h" 8 8 #include "xe_dep_scheduler.h" 9 9 #include "xe_exec_queue.h" 10 + #include "xe_gt_printk.h" 10 11 #include "xe_gt_types.h" 12 + #include "xe_page_reclaim.h" 11 13 #include "xe_tlb_inval.h" 12 14 #include "xe_tlb_inval_job.h" 13 15 #include "xe_migrate.h" ··· 26 24 struct xe_exec_queue *q; 27 25 /** @vm: VM which TLB invalidation is being issued for */ 28 26 struct xe_vm *vm; 27 + /** @prl: Embedded copy of page reclaim list */ 28 + struct xe_page_reclaim_list prl; 29 29 /** @refcount: ref count of this job */ 30 30 struct kref refcount; 31 31 /** ··· 51 47 container_of(dep_job, typeof(*job), dep); 52 48 struct xe_tlb_inval_fence *ifence = 53 49 container_of(job->fence, typeof(*ifence), base); 50 + struct drm_suballoc *prl_sa = NULL; 51 + 52 + if (xe_page_reclaim_list_valid(&job->prl)) { 53 + prl_sa = xe_page_reclaim_create_prl_bo(job->tlb_inval, &job->prl, ifence); 54 + if (IS_ERR(prl_sa)) 55 + prl_sa = NULL; /* Indicate fall back PPC flush with NULL */ 56 + } 54 57 55 58 xe_tlb_inval_range(job->tlb_inval, ifence, job->start, 56 - job->end, job->vm->usm.asid); 59 + job->end, job->vm->usm.asid, prl_sa); 57 60 58 61 return job->fence; 59 62 } ··· 118 107 job->start = start; 119 108 job->end = end; 120 109 job->fence_armed = false; 110 + xe_page_reclaim_list_init(&job->prl); 121 111 job->dep.ops = &dep_job_ops; 122 112 job->type = type; 123 113 kref_init(&job->refcount); ··· 152 140 return ERR_PTR(err); 153 141 } 154 142 143 + /** 144 + * xe_tlb_inval_job_add_page_reclaim() - Embed PRL into a TLB job 145 + * @job: TLB invalidation job that may trigger reclamation 146 + * @prl: Page reclaim list populated during unbind 147 + * 148 + * Copies @prl into the job and takes an extra reference to the entry page so 149 + * ownership can transfer to the TLB fence when the job is pushed. 150 + */ 151 + void xe_tlb_inval_job_add_page_reclaim(struct xe_tlb_inval_job *job, 152 + struct xe_page_reclaim_list *prl) 153 + { 154 + struct xe_device *xe = gt_to_xe(job->q->gt); 155 + 156 + xe_gt_WARN_ON(job->q->gt, !xe->info.has_page_reclaim_hw_assist); 157 + job->prl = *prl; 158 + /* Pair with put in job_destroy */ 159 + xe_page_reclaim_entries_get(job->prl.entries); 160 + } 161 + 155 162 static void xe_tlb_inval_job_destroy(struct kref *ref) 156 163 { 157 164 struct xe_tlb_inval_job *job = container_of(ref, typeof(*job), ··· 180 149 struct xe_exec_queue *q = job->q; 181 150 struct xe_device *xe = gt_to_xe(q->gt); 182 151 struct xe_vm *vm = job->vm; 152 + 153 + /* BO creation retains a copy (if used), so no longer needed */ 154 + xe_page_reclaim_entries_put(job->prl.entries); 183 155 184 156 if (!job->fence_armed) 185 157 kfree(ifence);
+4
drivers/gpu/drm/xe/xe_tlb_inval_job.h
··· 12 12 struct xe_dep_scheduler; 13 13 struct xe_exec_queue; 14 14 struct xe_migrate; 15 + struct xe_page_reclaim_list; 15 16 struct xe_tlb_inval; 16 17 struct xe_tlb_inval_job; 17 18 struct xe_vm; ··· 21 20 xe_tlb_inval_job_create(struct xe_exec_queue *q, struct xe_tlb_inval *tlb_inval, 22 21 struct xe_dep_scheduler *dep_scheduler, 23 22 struct xe_vm *vm, u64 start, u64 end, int type); 23 + 24 + void xe_tlb_inval_job_add_page_reclaim(struct xe_tlb_inval_job *job, 25 + struct xe_page_reclaim_list *prl); 24 26 25 27 int xe_tlb_inval_job_alloc_dep(struct xe_tlb_inval_job *job); 26 28
+4 -1
drivers/gpu/drm/xe/xe_tlb_inval_types.h
··· 9 9 #include <linux/workqueue.h> 10 10 #include <linux/dma-fence.h> 11 11 12 + struct drm_suballoc; 12 13 struct xe_tlb_inval; 13 14 14 15 /** struct xe_tlb_inval_ops - TLB invalidation ops (backend) */ ··· 41 40 * @start: Start address 42 41 * @end: End address 43 42 * @asid: Address space ID 43 + * @prl_sa: Suballocation for page reclaim list 44 44 * 45 45 * Return 0 on success, -ECANCELED if backend is mid-reset, error on 46 46 * failure 47 47 */ 48 48 int (*ppgtt)(struct xe_tlb_inval *tlb_inval, u32 seqno, u64 start, 49 - u64 end, u32 asid); 49 + u64 end, u32 asid, struct drm_suballoc *prl_sa); 50 50 51 51 /** 52 52 * @initialized: Backend is initialized ··· 82 80 const struct xe_tlb_inval_ops *ops; 83 81 /** @tlb_inval.seqno: TLB invalidation seqno, protected by CT lock */ 84 82 #define TLB_INVALIDATION_SEQNO_MAX 0x100000 83 + #define TLB_INVALIDATION_SEQNO_INVALID TLB_INVALIDATION_SEQNO_MAX 85 84 int seqno; 86 85 /** @tlb_invalidation.seqno_lock: protects @tlb_invalidation.seqno */ 87 86 struct mutex seqno_lock;
+46
drivers/gpu/drm/xe/xe_trace.h
··· 13 13 #include <linux/types.h> 14 14 15 15 #include "xe_exec_queue_types.h" 16 + #include "xe_exec_queue.h" 16 17 #include "xe_gpu_scheduler_types.h" 17 18 #include "xe_gt_types.h" 18 19 #include "xe_guc_exec_queue_types.h" ··· 98 97 __entry->guc_state, __entry->flags) 99 98 ); 100 99 100 + DECLARE_EVENT_CLASS(xe_exec_queue_multi_queue, 101 + TP_PROTO(struct xe_exec_queue *q), 102 + TP_ARGS(q), 103 + 104 + TP_STRUCT__entry( 105 + __string(dev, __dev_name_eq(q)) 106 + __field(enum xe_engine_class, class) 107 + __field(u32, logical_mask) 108 + __field(u8, gt_id) 109 + __field(u16, width) 110 + __field(u32, guc_id) 111 + __field(u32, guc_state) 112 + __field(u32, flags) 113 + __field(u32, primary) 114 + ), 115 + 116 + TP_fast_assign( 117 + __assign_str(dev); 118 + __entry->class = q->class; 119 + __entry->logical_mask = q->logical_mask; 120 + __entry->gt_id = q->gt->info.id; 121 + __entry->width = q->width; 122 + __entry->guc_id = q->guc->id; 123 + __entry->guc_state = atomic_read(&q->guc->state); 124 + __entry->flags = q->flags; 125 + __entry->primary = xe_exec_queue_multi_queue_primary(q)->guc->id; 126 + ), 127 + 128 + TP_printk("dev=%s, %d:0x%x, gt=%d, width=%d guc_id=%d, guc_state=0x%x, flags=0x%x, primary=%d", 129 + __get_str(dev), __entry->class, __entry->logical_mask, 130 + __entry->gt_id, __entry->width, __entry->guc_id, 131 + __entry->guc_state, __entry->flags, 132 + __entry->primary) 133 + ); 134 + 101 135 DEFINE_EVENT(xe_exec_queue, xe_exec_queue_create, 136 + TP_PROTO(struct xe_exec_queue *q), 137 + TP_ARGS(q) 138 + ); 139 + 140 + DEFINE_EVENT(xe_exec_queue_multi_queue, xe_exec_queue_create_multi_queue, 102 141 TP_PROTO(struct xe_exec_queue *q), 103 142 TP_ARGS(q) 104 143 ); ··· 209 168 ); 210 169 211 170 DEFINE_EVENT(xe_exec_queue, xe_exec_queue_memory_cat_error, 171 + TP_PROTO(struct xe_exec_queue *q), 172 + TP_ARGS(q) 173 + ); 174 + 175 + DEFINE_EVENT(xe_exec_queue, xe_exec_queue_cgp_context_error, 212 176 TP_PROTO(struct xe_exec_queue *q), 213 177 TP_ARGS(q) 214 178 );
+33 -2
drivers/gpu/drm/xe/xe_uc.c
··· 218 218 219 219 xe_guc_engine_activity_enable_stats(&uc->guc); 220 220 221 - /* We don't fail the driver load if HuC fails to auth, but let's warn */ 221 + /* We don't fail the driver load if HuC fails to auth */ 222 222 ret = xe_huc_auth(&uc->huc, XE_HUC_AUTH_VIA_GUC); 223 - xe_gt_assert(uc_to_gt(uc), !ret); 223 + if (ret) 224 + xe_gt_err(uc_to_gt(uc), 225 + "HuC authentication failed (%pe), continuing with no HuC\n", 226 + ERR_PTR(ret)); 224 227 225 228 /* GSC load is async */ 226 229 xe_gsc_load_start(&uc->gsc); ··· 302 299 xe_uc_stop(uc); 303 300 304 301 return xe_guc_suspend(&uc->guc); 302 + } 303 + 304 + /** 305 + * xe_uc_runtime_suspend() - UC runtime suspend 306 + * @uc: the UC object 307 + * 308 + * Runtime suspend all UCs. 309 + */ 310 + void xe_uc_runtime_suspend(struct xe_uc *uc) 311 + { 312 + if (!xe_device_uc_enabled(uc_to_xe(uc))) 313 + return; 314 + 315 + xe_guc_runtime_suspend(&uc->guc); 316 + } 317 + 318 + /** 319 + * xe_uc_runtime_resume() - UC runtime resume 320 + * @uc: the UC object 321 + * 322 + * Runtime resume all UCs. 323 + */ 324 + void xe_uc_runtime_resume(struct xe_uc *uc) 325 + { 326 + if (!xe_device_uc_enabled(uc_to_xe(uc))) 327 + return; 328 + 329 + xe_guc_runtime_resume(&uc->guc); 305 330 } 306 331 307 332 /**
+2
drivers/gpu/drm/xe/xe_uc.h
··· 14 14 int xe_uc_load_hw(struct xe_uc *uc); 15 15 void xe_uc_gucrc_disable(struct xe_uc *uc); 16 16 int xe_uc_reset_prepare(struct xe_uc *uc); 17 + void xe_uc_runtime_resume(struct xe_uc *uc); 18 + void xe_uc_runtime_suspend(struct xe_uc *uc); 17 19 void xe_uc_stop_prepare(struct xe_uc *uc); 18 20 void xe_uc_stop(struct xe_uc *uc); 19 21 int xe_uc_start(struct xe_uc *uc);
+5 -5
drivers/gpu/drm/xe/xe_uc_fw.c
··· 115 115 #define XE_GT_TYPE_ANY XE_GT_TYPE_UNINITIALIZED 116 116 117 117 #define XE_GUC_FIRMWARE_DEFS(fw_def, mmp_ver, major_ver) \ 118 - fw_def(PANTHERLAKE, GT_TYPE_ANY, major_ver(xe, guc, ptl, 70, 49, 4)) \ 119 - fw_def(BATTLEMAGE, GT_TYPE_ANY, major_ver(xe, guc, bmg, 70, 49, 4)) \ 120 - fw_def(LUNARLAKE, GT_TYPE_ANY, major_ver(xe, guc, lnl, 70, 45, 2)) \ 121 - fw_def(METEORLAKE, GT_TYPE_ANY, major_ver(i915, guc, mtl, 70, 44, 1)) \ 122 - fw_def(DG2, GT_TYPE_ANY, major_ver(i915, guc, dg2, 70, 45, 2)) \ 118 + fw_def(PANTHERLAKE, GT_TYPE_ANY, major_ver(xe, guc, ptl, 70, 54, 0)) \ 119 + fw_def(BATTLEMAGE, GT_TYPE_ANY, major_ver(xe, guc, bmg, 70, 54, 0)) \ 120 + fw_def(LUNARLAKE, GT_TYPE_ANY, major_ver(xe, guc, lnl, 70, 53, 0)) \ 121 + fw_def(METEORLAKE, GT_TYPE_ANY, major_ver(i915, guc, mtl, 70, 53, 0)) \ 122 + fw_def(DG2, GT_TYPE_ANY, major_ver(i915, guc, dg2, 70, 53, 0)) \ 123 123 fw_def(DG1, GT_TYPE_ANY, major_ver(i915, guc, dg1, 70, 44, 1)) \ 124 124 fw_def(ALDERLAKE_N, GT_TYPE_ANY, major_ver(i915, guc, tgl, 70, 44, 1)) \ 125 125 fw_def(ALDERLAKE_P, GT_TYPE_ANY, major_ver(i915, guc, adlp, 70, 44, 1)) \
+136 -18
drivers/gpu/drm/xe/xe_vm.c
··· 1509 1509 1510 1510 INIT_LIST_HEAD(&vm->preempt.exec_queues); 1511 1511 if (flags & XE_VM_FLAG_FAULT_MODE) 1512 - vm->preempt.min_run_period_ms = 0; 1512 + vm->preempt.min_run_period_ms = xe->min_run_period_pf_ms; 1513 1513 else 1514 - vm->preempt.min_run_period_ms = 5; 1514 + vm->preempt.min_run_period_ms = xe->min_run_period_lr_ms; 1515 1515 1516 1516 for_each_tile(tile, xe, id) 1517 1517 xe_range_fence_tree_init(&vm->rftree[id]); ··· 2236 2236 struct drm_gpuva_ops *ops; 2237 2237 struct drm_gpuva_op *__op; 2238 2238 struct drm_gpuvm_bo *vm_bo; 2239 + u64 range_start = addr; 2239 2240 u64 range_end = addr + range; 2240 2241 int err; 2241 2242 ··· 2249 2248 2250 2249 switch (operation) { 2251 2250 case DRM_XE_VM_BIND_OP_MAP: 2251 + if (flags & DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR) { 2252 + xe_vm_find_cpu_addr_mirror_vma_range(vm, &range_start, &range_end); 2253 + vops->flags |= XE_VMA_OPS_FLAG_ALLOW_SVM_UNMAP; 2254 + } 2255 + 2256 + fallthrough; 2252 2257 case DRM_XE_VM_BIND_OP_MAP_USERPTR: { 2253 2258 struct drm_gpuvm_map_req map_req = { 2254 - .map.va.addr = addr, 2255 - .map.va.range = range, 2259 + .map.va.addr = range_start, 2260 + .map.va.range = range_end - range_start, 2256 2261 .map.gem.obj = obj, 2257 2262 .map.gem.offset = bo_offset_or_userptr, 2258 2263 }; ··· 2458 2451 if (IS_ERR(vma)) 2459 2452 return vma; 2460 2453 2461 - if (xe_vma_is_userptr(vma)) 2454 + if (xe_vma_is_userptr(vma)) { 2462 2455 err = xe_vma_userptr_pin_pages(to_userptr_vma(vma)); 2456 + /* 2457 + * -EBUSY has dedicated meaning that a user fence 2458 + * attached to the VMA is busy, in practice 2459 + * xe_vma_userptr_pin_pages can only fail with -EBUSY if 2460 + * we are low on memory so convert this to -ENOMEM. 2461 + */ 2462 + if (err == -EBUSY) 2463 + err = -ENOMEM; 2464 + } 2463 2465 } 2464 2466 if (err) { 2465 2467 prep_vma_destroy(vm, vma, false); ··· 2743 2727 2744 2728 if (xe_vma_is_cpu_addr_mirror(vma) && 2745 2729 xe_svm_has_mapping(vm, xe_vma_start(vma), 2746 - xe_vma_end(vma))) 2730 + xe_vma_end(vma)) && 2731 + !(vops->flags & XE_VMA_OPS_FLAG_ALLOW_SVM_UNMAP)) 2747 2732 return -EBUSY; 2748 2733 2749 2734 if (!xe_vma_is_cpu_addr_mirror(vma)) ··· 3124 3107 struct dma_fence *fence = NULL; 3125 3108 struct dma_fence **fences = NULL; 3126 3109 struct dma_fence_array *cf = NULL; 3127 - int number_tiles = 0, current_fence = 0, n_fence = 0, err; 3110 + int number_tiles = 0, current_fence = 0, n_fence = 0, err, i; 3128 3111 u8 id; 3129 3112 3130 3113 number_tiles = vm_ops_setup_tile_args(vm, vops); 3131 3114 if (number_tiles == 0) 3132 3115 return ERR_PTR(-ENODATA); 3133 3116 3134 - if (vops->flags & XE_VMA_OPS_FLAG_SKIP_TLB_WAIT) { 3135 - for_each_tile(tile, vm->xe, id) 3136 - ++n_fence; 3137 - } else { 3138 - for_each_tile(tile, vm->xe, id) 3139 - n_fence += (1 + XE_MAX_GT_PER_TILE); 3117 + for_each_tile(tile, vm->xe, id) { 3118 + ++n_fence; 3119 + 3120 + if (!(vops->flags & XE_VMA_OPS_FLAG_SKIP_TLB_WAIT)) 3121 + for_each_tlb_inval(i) 3122 + ++n_fence; 3140 3123 } 3141 3124 3142 3125 fences = kmalloc_array(n_fence, sizeof(*fences), GFP_KERNEL); ··· 3166 3149 3167 3150 for_each_tile(tile, vm->xe, id) { 3168 3151 struct xe_exec_queue *q = vops->pt_update_ops[tile->id].q; 3169 - int i; 3170 3152 3171 3153 fence = NULL; 3172 3154 if (!vops->pt_update_ops[id].num_ops) ··· 3230 3214 { 3231 3215 switch (op->base.op) { 3232 3216 case DRM_GPUVA_OP_MAP: 3233 - vma_add_ufence(op->map.vma, ufence); 3217 + if (!xe_vma_is_cpu_addr_mirror(op->map.vma)) 3218 + vma_add_ufence(op->map.vma, ufence); 3234 3219 break; 3235 3220 case DRM_GPUVA_OP_REMAP: 3236 3221 if (op->remap.prev) ··· 3506 3489 u16 pat_index, u32 op, u32 bind_flags) 3507 3490 { 3508 3491 u16 coh_mode; 3492 + 3493 + if (XE_IOCTL_DBG(xe, (bo->flags & XE_BO_FLAG_NO_COMPRESSION) && 3494 + xe_pat_index_get_comp_en(xe, pat_index))) 3495 + return -EINVAL; 3509 3496 3510 3497 if (XE_IOCTL_DBG(xe, range > xe_bo_size(bo)) || 3511 3498 XE_IOCTL_DBG(xe, obj_offset > ··· 3934 3913 3935 3914 err = xe_tlb_inval_range(&tile->primary_gt->tlb_inval, 3936 3915 &fence[fence_id], start, end, 3937 - vm->usm.asid); 3916 + vm->usm.asid, NULL); 3938 3917 if (err) 3939 3918 goto wait; 3940 3919 ++fence_id; ··· 3947 3926 3948 3927 err = xe_tlb_inval_range(&tile->media_gt->tlb_inval, 3949 3928 &fence[fence_id], start, end, 3950 - vm->usm.asid); 3929 + vm->usm.asid, NULL); 3951 3930 if (err) 3952 3931 goto wait; 3953 3932 ++fence_id; ··· 4053 4032 } 4054 4033 4055 4034 struct xe_vm_snapshot { 4035 + int uapi_flags; 4056 4036 unsigned long num_snaps; 4057 4037 struct { 4058 4038 u64 ofs, bo_ofs; 4059 4039 unsigned long len; 4040 + #define XE_VM_SNAP_FLAG_USERPTR BIT(0) 4041 + #define XE_VM_SNAP_FLAG_READ_ONLY BIT(1) 4042 + #define XE_VM_SNAP_FLAG_IS_NULL BIT(2) 4043 + unsigned long flags; 4044 + int uapi_mem_region; 4045 + int pat_index; 4046 + int cpu_caching; 4060 4047 struct xe_bo *bo; 4061 4048 void *data; 4062 4049 struct mm_struct *mm; ··· 4093 4064 goto out_unlock; 4094 4065 } 4095 4066 4067 + if (vm->flags & XE_VM_FLAG_FAULT_MODE) 4068 + snap->uapi_flags |= DRM_XE_VM_CREATE_FLAG_FAULT_MODE; 4069 + if (vm->flags & XE_VM_FLAG_LR_MODE) 4070 + snap->uapi_flags |= DRM_XE_VM_CREATE_FLAG_LR_MODE; 4071 + if (vm->flags & XE_VM_FLAG_SCRATCH_PAGE) 4072 + snap->uapi_flags |= DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE; 4073 + 4096 4074 snap->num_snaps = num_snaps; 4097 4075 i = 0; 4098 4076 drm_gpuvm_for_each_va(gpuva, &vm->gpuvm) { ··· 4112 4076 4113 4077 snap->snap[i].ofs = xe_vma_start(vma); 4114 4078 snap->snap[i].len = xe_vma_size(vma); 4079 + snap->snap[i].flags = xe_vma_read_only(vma) ? 4080 + XE_VM_SNAP_FLAG_READ_ONLY : 0; 4081 + snap->snap[i].pat_index = vma->attr.pat_index; 4115 4082 if (bo) { 4083 + snap->snap[i].cpu_caching = bo->cpu_caching; 4116 4084 snap->snap[i].bo = xe_bo_get(bo); 4117 4085 snap->snap[i].bo_ofs = xe_vma_bo_offset(vma); 4086 + switch (bo->ttm.resource->mem_type) { 4087 + case XE_PL_SYSTEM: 4088 + case XE_PL_TT: 4089 + snap->snap[i].uapi_mem_region = 0; 4090 + break; 4091 + case XE_PL_VRAM0: 4092 + snap->snap[i].uapi_mem_region = 1; 4093 + break; 4094 + case XE_PL_VRAM1: 4095 + snap->snap[i].uapi_mem_region = 2; 4096 + break; 4097 + } 4118 4098 } else if (xe_vma_is_userptr(vma)) { 4119 4099 struct mm_struct *mm = 4120 4100 to_userptr_vma(vma)->userptr.notifier.mm; ··· 4141 4089 snap->snap[i].data = ERR_PTR(-EFAULT); 4142 4090 4143 4091 snap->snap[i].bo_ofs = xe_vma_userptr(vma); 4092 + snap->snap[i].flags |= XE_VM_SNAP_FLAG_USERPTR; 4093 + snap->snap[i].uapi_mem_region = 0; 4094 + } else if (xe_vma_is_null(vma)) { 4095 + snap->snap[i].flags |= XE_VM_SNAP_FLAG_IS_NULL; 4096 + snap->snap[i].uapi_mem_region = -1; 4144 4097 } else { 4145 4098 snap->snap[i].data = ERR_PTR(-ENOENT); 4099 + snap->snap[i].uapi_mem_region = -1; 4146 4100 } 4147 4101 i++; 4148 4102 } ··· 4167 4109 struct xe_bo *bo = snap->snap[i].bo; 4168 4110 int err; 4169 4111 4170 - if (IS_ERR(snap->snap[i].data)) 4112 + if (IS_ERR(snap->snap[i].data) || 4113 + snap->snap[i].flags & XE_VM_SNAP_FLAG_IS_NULL) 4171 4114 continue; 4172 4115 4173 4116 snap->snap[i].data = kvmalloc(snap->snap[i].len, GFP_USER); ··· 4214 4155 return; 4215 4156 } 4216 4157 4158 + drm_printf(p, "VM.uapi_flags: 0x%x\n", snap->uapi_flags); 4217 4159 for (i = 0; i < snap->num_snaps; i++) { 4218 4160 drm_printf(p, "[%llx].length: 0x%lx\n", snap->snap[i].ofs, snap->snap[i].len); 4161 + 4162 + drm_printf(p, "[%llx].properties: %s|%s|mem_region=0x%lx|pat_index=%d|cpu_caching=%d\n", 4163 + snap->snap[i].ofs, 4164 + snap->snap[i].flags & XE_VM_SNAP_FLAG_READ_ONLY ? 4165 + "read_only" : "read_write", 4166 + snap->snap[i].flags & XE_VM_SNAP_FLAG_IS_NULL ? 4167 + "null_sparse" : 4168 + snap->snap[i].flags & XE_VM_SNAP_FLAG_USERPTR ? 4169 + "userptr" : "bo", 4170 + snap->snap[i].uapi_mem_region == -1 ? 0 : 4171 + BIT(snap->snap[i].uapi_mem_region), 4172 + snap->snap[i].pat_index, 4173 + snap->snap[i].cpu_caching); 4219 4174 4220 4175 if (IS_ERR(snap->snap[i].data)) { 4221 4176 drm_printf(p, "[%llx].error: %li\n", snap->snap[i].ofs, 4222 4177 PTR_ERR(snap->snap[i].data)); 4223 4178 continue; 4224 4179 } 4180 + 4181 + if (snap->snap[i].flags & XE_VM_SNAP_FLAG_IS_NULL) 4182 + continue; 4225 4183 4226 4184 drm_printf(p, "[%llx].data: ", snap->snap[i].ofs); 4227 4185 ··· 4393 4317 4394 4318 if (is_madvise) 4395 4319 vops.flags |= XE_VMA_OPS_FLAG_MADVISE; 4320 + else 4321 + vops.flags |= XE_VMA_OPS_FLAG_ALLOW_SVM_UNMAP; 4396 4322 4397 4323 err = vm_bind_ioctl_ops_parse(vm, ops, &vops); 4398 4324 if (err) ··· 4466 4388 vm_dbg(&vm->xe->drm, "MADVISE_OPS_CREATE: addr=0x%016llx, size=0x%016llx", start, range); 4467 4389 4468 4390 return xe_vm_alloc_vma(vm, &map_req, true); 4391 + } 4392 + 4393 + static bool is_cpu_addr_vma_with_default_attr(struct xe_vma *vma) 4394 + { 4395 + return vma && xe_vma_is_cpu_addr_mirror(vma) && 4396 + xe_vma_has_default_mem_attrs(vma); 4397 + } 4398 + 4399 + /** 4400 + * xe_vm_find_cpu_addr_mirror_vma_range - Extend a VMA range to include adjacent CPU-mirrored VMAs 4401 + * @vm: VM to search within 4402 + * @start: Input/output pointer to the starting address of the range 4403 + * @end: Input/output pointer to the end address of the range 4404 + * 4405 + * Given a range defined by @start and @range, this function checks the VMAs 4406 + * immediately before and after the range. If those neighboring VMAs are 4407 + * CPU-address-mirrored and have default memory attributes, the function 4408 + * updates @start and @range to include them. This extended range can then 4409 + * be used for merging or other operations that require a unified VMA. 4410 + * 4411 + * The function does not perform the merge itself; it only computes the 4412 + * mergeable boundaries. 4413 + */ 4414 + void xe_vm_find_cpu_addr_mirror_vma_range(struct xe_vm *vm, u64 *start, u64 *end) 4415 + { 4416 + struct xe_vma *prev, *next; 4417 + 4418 + lockdep_assert_held(&vm->lock); 4419 + 4420 + if (*start >= SZ_4K) { 4421 + prev = xe_vm_find_vma_by_addr(vm, *start - SZ_4K); 4422 + if (is_cpu_addr_vma_with_default_attr(prev)) 4423 + *start = xe_vma_start(prev); 4424 + } 4425 + 4426 + if (*end < vm->size) { 4427 + next = xe_vm_find_vma_by_addr(vm, *end + 1); 4428 + if (is_cpu_addr_vma_with_default_attr(next)) 4429 + *end = xe_vma_end(next); 4430 + } 4469 4431 } 4470 4432 4471 4433 /**
+3
drivers/gpu/drm/xe/xe_vm.h
··· 68 68 69 69 bool xe_vma_has_default_mem_attrs(struct xe_vma *vma); 70 70 71 + void xe_vm_find_cpu_addr_mirror_vma_range(struct xe_vm *vm, 72 + u64 *start, 73 + u64 *end); 71 74 /** 72 75 * xe_vm_has_scratch() - Whether the vm is configured for scratch PTEs 73 76 * @vm: The vm
+1
drivers/gpu/drm/xe/xe_vm_types.h
··· 467 467 #define XE_VMA_OPS_FLAG_MADVISE BIT(1) 468 468 #define XE_VMA_OPS_ARRAY_OF_BINDS BIT(2) 469 469 #define XE_VMA_OPS_FLAG_SKIP_TLB_WAIT BIT(3) 470 + #define XE_VMA_OPS_FLAG_ALLOW_SVM_UNMAP BIT(4) 470 471 u32 flags; 471 472 #ifdef TEST_VM_OPS_ERROR 472 473 /** @inject_error: inject error to test error handling */
+2 -4
drivers/gpu/drm/xe/xe_vram.c
··· 156 156 static int get_flat_ccs_offset(struct xe_gt *gt, u64 tile_size, u64 *poffset) 157 157 { 158 158 struct xe_device *xe = gt_to_xe(gt); 159 - unsigned int fw_ref; 160 159 u64 offset; 161 160 u32 reg; 162 161 163 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 164 - if (!fw_ref) 162 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 163 + if (!fw_ref.domains) 165 164 return -ETIMEDOUT; 166 165 167 166 if (GRAPHICS_VER(xe) >= 20) { ··· 192 193 offset = (u64)REG_FIELD_GET(XEHP_FLAT_CCS_PTR, reg) * SZ_64K; 193 194 } 194 195 195 - xe_force_wake_put(gt_to_fw(gt), fw_ref); 196 196 *poffset = offset; 197 197 198 198 return 0;
+6 -46
drivers/gpu/drm/xe/xe_wa.c
··· 15 15 16 16 #include "regs/xe_engine_regs.h" 17 17 #include "regs/xe_gt_regs.h" 18 + #include "regs/xe_guc_regs.h" 18 19 #include "regs/xe_regs.h" 19 20 #include "xe_device_types.h" 20 21 #include "xe_force_wake.h" ··· 217 216 XE_RTP_ACTIONS(SET(XELPMP_SQCNT1, ENFORCE_RAR)) 218 217 }, 219 218 220 - /* Xe2_LPG */ 221 - 222 - { XE_RTP_NAME("16020975621"), 223 - XE_RTP_RULES(GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0)), 224 - XE_RTP_ACTIONS(SET(XEHP_SLICE_UNIT_LEVEL_CLKGATE, SBEUNIT_CLKGATE_DIS)) 225 - }, 226 - { XE_RTP_NAME("14018157293"), 227 - XE_RTP_RULES(GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0)), 228 - XE_RTP_ACTIONS(SET(XEHPC_L3CLOS_MASK(0), ~0), 229 - SET(XEHPC_L3CLOS_MASK(1), ~0), 230 - SET(XEHPC_L3CLOS_MASK(2), ~0), 231 - SET(XEHPC_L3CLOS_MASK(3), ~0)) 232 - }, 233 - 234 219 /* Xe2_LPM */ 235 220 236 221 { XE_RTP_NAME("14017421178"), ··· 301 314 ENGINE_CLASS(VIDEO_DECODE)), 302 315 XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F10(0), RAMDFTUNIT_CLKGATE_DIS)), 303 316 XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 317 + }, 318 + { XE_RTP_NAME("16028005424"), 319 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3005)), 320 + XE_RTP_ACTIONS(SET(GUC_INTR_CHICKEN, DISABLE_SIGNALING_ENGINES)) 304 321 }, 305 322 }; 306 323 ··· 495 504 XE_RTP_RULES(GRAPHICS_VERSION(2004), FUNC(xe_rtp_match_first_render_or_compute)), 496 505 XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0_UDW, XE2_ALLOC_DPA_STARVE_FIX_DIS)) 497 506 }, 498 - { XE_RTP_NAME("14018957109"), 499 - XE_RTP_RULES(GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0), 500 - FUNC(xe_rtp_match_first_render_or_compute)), 501 - XE_RTP_ACTIONS(SET(HALF_SLICE_CHICKEN5, DISABLE_SAMPLE_G_PERFORMANCE)) 502 - }, 503 507 { XE_RTP_NAME("14020338487"), 504 508 XE_RTP_RULES(GRAPHICS_VERSION(2004), FUNC(xe_rtp_match_first_render_or_compute)), 505 509 XE_RTP_ACTIONS(SET(ROW_CHICKEN3, XE2_EUPEND_CHK_FLUSH_DIS)) ··· 503 517 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), 504 518 FUNC(xe_rtp_match_first_render_or_compute)), 505 519 XE_RTP_ACTIONS(SET(ROW_CHICKEN4, DISABLE_TDL_PUSH)) 506 - }, 507 - { XE_RTP_NAME("14019322943"), 508 - XE_RTP_RULES(GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0), 509 - FUNC(xe_rtp_match_first_render_or_compute)), 510 - XE_RTP_ACTIONS(SET(LSC_CHICKEN_BIT_0, TGM_WRITE_EOM_FORCE)) 511 520 }, 512 521 { XE_RTP_NAME("14018471104"), 513 522 XE_RTP_RULES(GRAPHICS_VERSION(2004), FUNC(xe_rtp_match_first_render_or_compute)), ··· 674 693 XE_RTP_ACTIONS(SET(HALF_SLICE_CHICKEN7, CLEAR_OPTIMIZATION_DISABLE)) 675 694 }, 676 695 { XE_RTP_NAME("18041344222"), 677 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3001), 696 + XE_RTP_RULES(GRAPHICS_VERSION(3000), 678 697 FUNC(xe_rtp_match_first_render_or_compute), 679 698 FUNC(xe_rtp_match_not_sriov_vf), 680 699 FUNC(xe_rtp_match_gt_has_discontiguous_dss_groups)), ··· 780 799 781 800 /* Xe2_LPG */ 782 801 783 - { XE_RTP_NAME("16020518922"), 784 - XE_RTP_RULES(GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0), 785 - ENGINE_CLASS(RENDER)), 786 - XE_RTP_ACTIONS(SET(FF_MODE, 787 - DIS_TE_AUTOSTRIP | 788 - DIS_MESH_PARTIAL_AUTOSTRIP | 789 - DIS_MESH_AUTOSTRIP), 790 - SET(VFLSKPD, 791 - DIS_PARTIAL_AUTOSTRIP | 792 - DIS_AUTOSTRIP)) 793 - }, 794 802 { XE_RTP_NAME("14019386621"), 795 803 XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)), 796 804 XE_RTP_ACTIONS(SET(VF_SCRATCHPAD, XE2_VFG_TED_CREDIT_INTERFACE_DISABLE)) ··· 788 818 XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)), 789 819 XE_RTP_ACTIONS(SET(XEHP_PSS_CHICKEN, FD_END_COLLECT)) 790 820 }, 791 - { XE_RTP_NAME("14020013138"), 792 - XE_RTP_RULES(GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0), 793 - ENGINE_CLASS(RENDER)), 794 - XE_RTP_ACTIONS(SET(WM_CHICKEN3, HIZ_PLANE_COMPRESSION_DIS)) 795 - }, 796 821 { XE_RTP_NAME("14019988906"), 797 822 XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)), 798 823 XE_RTP_ACTIONS(SET(XEHP_PSS_CHICKEN, FLSH_IGNORES_PSD)) 799 - }, 800 - { XE_RTP_NAME("16020183090"), 801 - XE_RTP_RULES(GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0), 802 - ENGINE_CLASS(RENDER)), 803 - XE_RTP_ACTIONS(SET(INSTPM(RENDER_RING_BASE), ENABLE_SEMAPHORE_POLL_BIT)) 804 824 }, 805 825 { XE_RTP_NAME("18033852989"), 806 826 XE_RTP_RULES(GRAPHICS_VERSION(2004), ENGINE_CLASS(RENDER)),
+1 -5
drivers/gpu/drm/xe/xe_wa_oob.rules
··· 16 16 16017236439 PLATFORM(PVC) 17 17 14019821291 MEDIA_VERSION_RANGE(1300, 2000) 18 18 14015076503 MEDIA_VERSION(1300) 19 - 16020292621 GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0) 20 - 14018913170 GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0) 21 - MEDIA_VERSION(2000), GRAPHICS_STEP(A0, A1) 22 - GRAPHICS_VERSION_RANGE(1270, 1274) 19 + 14018913170 GRAPHICS_VERSION_RANGE(1270, 1274) 23 20 MEDIA_VERSION(1300) 24 21 PLATFORM(DG2) 25 22 14018094691 GRAPHICS_VERSION_RANGE(2001, 2002) 26 23 GRAPHICS_VERSION(2004) 27 - 14019882105 GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0) 28 24 18024947630 GRAPHICS_VERSION(2001) 29 25 GRAPHICS_VERSION(2004) 30 26 MEDIA_VERSION(2000)
+78 -3
include/uapi/drm/xe_drm.h
··· 106 106 #define DRM_XE_OBSERVATION 0x0b 107 107 #define DRM_XE_MADVISE 0x0c 108 108 #define DRM_XE_VM_QUERY_MEM_RANGE_ATTRS 0x0d 109 + #define DRM_XE_EXEC_QUEUE_SET_PROPERTY 0x0e 109 110 110 111 /* Must be kept compact -- no holes */ 111 112 ··· 124 123 #define DRM_IOCTL_XE_OBSERVATION DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param) 125 124 #define DRM_IOCTL_XE_MADVISE DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise) 126 125 #define DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_VM_QUERY_MEM_RANGE_ATTRS, struct drm_xe_vm_query_mem_range_attr) 126 + #define DRM_IOCTL_XE_EXEC_QUEUE_SET_PROPERTY DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC_QUEUE_SET_PROPERTY, struct drm_xe_exec_queue_set_property) 127 127 128 128 /** 129 129 * DOC: Xe IOCTL Extensions ··· 212 210 /** @pad: MBZ */ 213 211 __u32 pad; 214 212 215 - /** @value: property value */ 216 - __u64 value; 213 + union { 214 + /** @value: property value */ 215 + __u64 value; 216 + /** @ptr: pointer to user value */ 217 + __u64 ptr; 218 + }; 217 219 218 220 /** @reserved: Reserved */ 219 221 __u64 reserved[2]; ··· 409 403 * has low latency hint support 410 404 * - %DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR - Flag is set if the 411 405 * device has CPU address mirroring support 406 + * - %DRM_XE_QUERY_CONFIG_FLAG_HAS_NO_COMPRESSION_HINT - Flag is set if the 407 + * device supports the userspace hint %DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION. 408 + * This is exposed only on Xe2+. 412 409 * - %DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT - Minimal memory alignment 413 410 * required by this device, typically SZ_4K or SZ_64K 414 411 * - %DRM_XE_QUERY_CONFIG_VA_BITS - Maximum bits of a virtual address ··· 430 421 #define DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM (1 << 0) 431 422 #define DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY (1 << 1) 432 423 #define DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR (1 << 2) 424 + #define DRM_XE_QUERY_CONFIG_FLAG_HAS_NO_COMPRESSION_HINT (1 << 3) 433 425 #define DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT 2 434 426 #define DRM_XE_QUERY_CONFIG_VA_BITS 3 435 427 #define DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY 4 ··· 801 791 * need to use VRAM for display surfaces, therefore the kernel requires 802 792 * setting this flag for such objects, otherwise an error is thrown on 803 793 * small-bar systems. 794 + * - %DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION - Allows userspace to 795 + * hint that compression (CCS) should be disabled for the buffer being 796 + * created. This can avoid unnecessary memory operations and CCS state 797 + * management. 798 + * On pre-Xe2 platforms, this flag is currently rejected as compression 799 + * control is not supported via PAT index. On Xe2+ platforms, compression 800 + * is controlled via PAT entries. If this flag is set, the driver will reject 801 + * any VM bind that requests a PAT index enabling compression for this BO. 802 + * Note: On dGPU platforms, there is currently no change in behavior with 803 + * this flag, but future improvements may leverage it. The current benefit is 804 + * primarily applicable to iGPU platforms. 804 805 * 805 806 * @cpu_caching supports the following values: 806 807 * - %DRM_XE_GEM_CPU_CACHING_WB - Allocate the pages with write-back ··· 858 837 #define DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING (1 << 0) 859 838 #define DRM_XE_GEM_CREATE_FLAG_SCANOUT (1 << 1) 860 839 #define DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM (1 << 2) 840 + #define DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION (1 << 3) 861 841 /** 862 842 * @flags: Flags, currently a mask of memory instances of where BO can 863 843 * be placed ··· 1274 1252 * Given that going into a power-saving state kills PXP HWDRM sessions, 1275 1253 * runtime PM will be blocked while queues of this type are alive. 1276 1254 * All PXP queues will be killed if a PXP invalidation event occurs. 1255 + * - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP - Create a multi-queue group 1256 + * or add secondary queues to a multi-queue group. 1257 + * If the extension's 'value' field has %DRM_XE_MULTI_GROUP_CREATE flag set, 1258 + * then a new multi-queue group is created with this queue as the primary queue 1259 + * (Q0). Otherwise, the queue gets added to the multi-queue group whose primary 1260 + * queue's exec_queue_id is specified in the lower 32 bits of the 'value' field. 1261 + * If the extension's 'value' field has %DRM_XE_MULTI_GROUP_KEEP_ACTIVE flag 1262 + * set, then the multi-queue group is kept active after the primary queue is 1263 + * destroyed. 1264 + * All the other non-relevant bits of extension's 'value' field while adding the 1265 + * primary or the secondary queues of the group must be set to 0. 1266 + * - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY - Set the queue 1267 + * priority within the multi-queue group. Current valid priority values are 0–2 1268 + * (default is 1), with higher values indicating higher priority. 1277 1269 * 1278 1270 * The example below shows how to use @drm_xe_exec_queue_create to create 1279 1271 * a simple exec_queue (no parallel submission) of class ··· 1328 1292 #define DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY 0 1329 1293 #define DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE 1 1330 1294 #define DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE 2 1295 + #define DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE 3 1296 + #define DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP 4 1297 + #define DRM_XE_MULTI_GROUP_CREATE (1ull << 63) 1298 + #define DRM_XE_MULTI_GROUP_KEEP_ACTIVE (1ull << 62) 1299 + #define DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY 5 1331 1300 /** @extensions: Pointer to the first extension struct, if any */ 1332 1301 __u64 extensions; 1333 1302 ··· 1696 1655 1697 1656 /** @DRM_XE_OA_UNIT_TYPE_OAM_SAG: OAM_SAG OA unit */ 1698 1657 DRM_XE_OA_UNIT_TYPE_OAM_SAG, 1658 + 1659 + /** @DRM_XE_OA_UNIT_TYPE_MERT: MERT OA unit */ 1660 + DRM_XE_OA_UNIT_TYPE_MERT, 1699 1661 }; 1700 1662 1701 1663 /** ··· 1721 1677 #define DRM_XE_OA_CAPS_OA_BUFFER_SIZE (1 << 2) 1722 1678 #define DRM_XE_OA_CAPS_WAIT_NUM_REPORTS (1 << 3) 1723 1679 #define DRM_XE_OA_CAPS_OAM (1 << 4) 1680 + #define DRM_XE_OA_CAPS_OA_UNIT_GT_ID (1 << 5) 1724 1681 1725 1682 /** @oa_timestamp_freq: OA timestamp freq */ 1726 1683 __u64 oa_timestamp_freq; 1727 1684 1685 + /** @gt_id: gt id for this OA unit */ 1686 + __u16 gt_id; 1687 + 1688 + /** @reserved1: MBZ */ 1689 + __u16 reserved1[3]; 1690 + 1728 1691 /** @reserved: MBZ */ 1729 - __u64 reserved[4]; 1692 + __u64 reserved[3]; 1730 1693 1731 1694 /** @num_engines: number of engines in @eci array */ 1732 1695 __u64 num_engines; ··· 2323 2272 /** @reserved: Reserved */ 2324 2273 __u64 reserved[2]; 2325 2274 2275 + }; 2276 + 2277 + /** 2278 + * struct drm_xe_exec_queue_set_property - exec queue set property 2279 + * 2280 + * Sets execution queue properties dynamically. 2281 + * Currently only %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY 2282 + * property can be dynamically set. 2283 + */ 2284 + struct drm_xe_exec_queue_set_property { 2285 + /** @extensions: Pointer to the first extension struct, if any */ 2286 + __u64 extensions; 2287 + 2288 + /** @exec_queue_id: Exec queue ID */ 2289 + __u32 exec_queue_id; 2290 + 2291 + /** @property: property to set */ 2292 + __u32 property; 2293 + 2294 + /** @value: property value */ 2295 + __u64 value; 2296 + 2297 + /** @reserved: Reserved */ 2298 + __u64 reserved[2]; 2326 2299 }; 2327 2300 2328 2301 #if defined(__cplusplus)