Merge tag 'drm-xe-next-2025-03-07' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

+107

Documentation/gpu/rfc/gpusvm.rst

··· 1 + .. SPDX-License-Identifier: (GPL-2.0+ OR MIT) 2 + 3 + =============== 4 + GPU SVM Section 5 + =============== 6 + 7 + Agreed upon design principles 8 + ============================= 9 + 10 + * migrate_to_ram path 11 + * Rely only on core MM concepts (migration PTEs, page references, and 12 + page locking). 13 + * No driver specific locks other than locks for hardware interaction in 14 + this path. These are not required and generally a bad idea to 15 + invent driver defined locks to seal core MM races. 16 + * An example of a driver-specific lock causing issues occurred before 17 + fixing do_swap_page to lock the faulting page. A driver-exclusive lock 18 + in migrate_to_ram produced a stable livelock if enough threads read 19 + the faulting page. 20 + * Partial migration is supported (i.e., a subset of pages attempting to 21 + migrate can actually migrate, with only the faulting page guaranteed 22 + to migrate). 23 + * Driver handles mixed migrations via retry loops rather than locking. 24 + * Eviction 25 + * Eviction is defined as migrating data from the GPU back to the 26 + CPU without a virtual address to free up GPU memory. 27 + * Only looking at physical memory data structures and locks as opposed to 28 + looking at virtual memory data structures and locks. 29 + * No looking at mm/vma structs or relying on those being locked. 30 + * The rationale for the above two points is that CPU virtual addresses 31 + can change at any moment, while the physical pages remain stable. 32 + * GPU page table invalidation, which requires a GPU virtual address, is 33 + handled via the notifier that has access to the GPU virtual address. 34 + * GPU fault side 35 + * mmap_read only used around core MM functions which require this lock 36 + and should strive to take mmap_read lock only in GPU SVM layer. 37 + * Big retry loop to handle all races with the mmu notifier under the gpu 38 + pagetable locks/mmu notifier range lock/whatever we end up calling 39 + those. 40 + * Races (especially against concurrent eviction or migrate_to_ram) 41 + should not be handled on the fault side by trying to hold locks; 42 + rather, they should be handled using retry loops. One possible 43 + exception is holding a BO's dma-resv lock during the initial migration 44 + to VRAM, as this is a well-defined lock that can be taken underneath 45 + the mmap_read lock. 46 + * One possible issue with the above approach is if a driver has a strict 47 + migration policy requiring GPU access to occur in GPU memory. 48 + Concurrent CPU access could cause a livelock due to endless retries. 49 + While no current user (Xe) of GPU SVM has such a policy, it is likely 50 + to be added in the future. Ideally, this should be resolved on the 51 + core-MM side rather than through a driver-side lock. 52 + * Physical memory to virtual backpointer 53 + * This does not work, as no pointers from physical memory to virtual 54 + memory should exist. mremap() is an example of the core MM updating 55 + the virtual address without notifying the driver of address 56 + change rather the driver only receiving the invalidation notifier. 57 + * The physical memory backpointer (page->zone_device_data) should remain 58 + stable from allocation to page free. Safely updating this against a 59 + concurrent user would be very difficult unless the page is free. 60 + * GPU pagetable locking 61 + * Notifier lock only protects range tree, pages valid state for a range 62 + (rather than seqno due to wider notifiers), pagetable entries, and 63 + mmu notifier seqno tracking, it is not a global lock to protect 64 + against races. 65 + * All races handled with big retry as mentioned above. 66 + 67 + Overview of baseline design 68 + =========================== 69 + 70 + Baseline design is simple as possible to get a working basline in which can be 71 + built upon. 72 + 73 + .. kernel-doc:: drivers/gpu/drm/xe/drm_gpusvm.c 74 + :doc: Overview 75 + :doc: Locking 76 + :doc: Migrataion 77 + :doc: Partial Unmapping of Ranges 78 + :doc: Examples 79 + 80 + Possible future design features 81 + =============================== 82 + 83 + * Concurrent GPU faults 84 + * CPU faults are concurrent so makes sense to have concurrent GPU 85 + faults. 86 + * Should be possible with fined grained locking in the driver GPU 87 + fault handler. 88 + * No expected GPU SVM changes required. 89 + * Ranges with mixed system and device pages 90 + * Can be added if required to drm_gpusvm_get_pages fairly easily. 91 + * Multi-GPU support 92 + * Work in progress and patches expected after initially landing on GPU 93 + SVM. 94 + * Ideally can be done with little to no changes to GPU SVM. 95 + * Drop ranges in favor of radix tree 96 + * May be desirable for faster notifiers. 97 + * Compound device pages 98 + * Nvidia, AMD, and Intel all have agreed expensive core MM functions in 99 + migrate device layer are a performance bottleneck, having compound 100 + device pages should help increase performance by reducing the number 101 + of these expensive calls. 102 + * Higher order dma mapping for migration 103 + * 4k dma mapping adversely affects migration performance on Intel 104 + hardware, higher order (2M) dma mapping should help here. 105 + * Build common userptr implementation on top of GPU SVM 106 + * Driver side madvise implementation and migration policies 107 + * Pull in pending dma-mapping API changes from Leon / Nvidia when these land

+4

Documentation/gpu/rfc/index.rst

··· 18 18 19 19 .. toctree:: 20 20 21 + gpusvm.rst 22 + 23 + .. toctree:: 24 + 21 25 i915_gem_lmem.rst 22 26 23 27 .. toctree::

+3

drivers/base/component.c

··· 588 588 { 589 589 WARN_ON(!component->bound); 590 590 591 + dev_dbg(adev->parent, "unbinding %s component %p (ops %ps)\n", 592 + dev_name(component->dev), component, component->ops); 593 + 591 594 if (component->ops && component->ops->unbind) 592 595 component->ops->unbind(component->dev, adev->parent, data); 593 596 component->bound = false;

+11 -1

drivers/base/devres.c

··· 576 576 } 577 577 EXPORT_SYMBOL_GPL(devres_open_group); 578 578 579 - /* Find devres group with ID @id. If @id is NULL, look for the latest. */ 579 + /* 580 + * Find devres group with ID @id. If @id is NULL, look for the latest open 581 + * group. 582 + */ 580 583 static struct devres_group *find_group(struct device *dev, void *id) 581 584 { 582 585 struct devres_node *node; ··· 690 687 spin_unlock_irqrestore(&dev->devres_lock, flags); 691 688 692 689 release_nodes(dev, &todo); 690 + } else if (list_empty(&dev->devres_head)) { 691 + /* 692 + * dev is probably dying via devres_release_all(): groups 693 + * have already been removed and are on the process of 694 + * being released - don't touch and don't warn. 695 + */ 696 + spin_unlock_irqrestore(&dev->devres_lock, flags); 693 697 } else { 694 698 WARN_ON(1); 695 699 spin_unlock_irqrestore(&dev->devres_lock, flags);

+9

drivers/gpu/drm/Kconfig

··· 278 278 GPU-VM representation providing helpers to manage a GPUs virtual 279 279 address space 280 280 281 + config DRM_GPUSVM 282 + tristate 283 + depends on DRM && DEVICE_PRIVATE 284 + select HMM_MIRROR 285 + select MMU_NOTIFIER 286 + help 287 + GPU-SVM representation providing helpers to manage a GPUs shared 288 + virtual memory 289 + 281 290 config DRM_BUDDY 282 291 tristate 283 292 depends on DRM

+1

drivers/gpu/drm/Makefile

··· 104 104 # 105 105 obj-$(CONFIG_DRM_EXEC) += drm_exec.o 106 106 obj-$(CONFIG_DRM_GPUVM) += drm_gpuvm.o 107 + obj-$(CONFIG_DRM_GPUSVM) += drm_gpusvm.o 107 108 108 109 obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o 109 110

+2236

drivers/gpu/drm/drm_gpusvm.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only OR MIT 2 + /* 3 + * Copyright © 2024 Intel Corporation 4 + * 5 + * Authors: 6 + * Matthew Brost <matthew.brost@intel.com> 7 + */ 8 + 9 + #include <linux/dma-mapping.h> 10 + #include <linux/hmm.h> 11 + #include <linux/memremap.h> 12 + #include <linux/migrate.h> 13 + #include <linux/mm_types.h> 14 + #include <linux/pagemap.h> 15 + #include <linux/slab.h> 16 + 17 + #include <drm/drm_device.h> 18 + #include <drm/drm_gpusvm.h> 19 + #include <drm/drm_pagemap.h> 20 + #include <drm/drm_print.h> 21 + 22 + /** 23 + * DOC: Overview 24 + * 25 + * GPU Shared Virtual Memory (GPU SVM) layer for the Direct Rendering Manager (DRM) 26 + * 27 + * The GPU SVM layer is a component of the DRM framework designed to manage shared 28 + * virtual memory between the CPU and GPU. It enables efficient data exchange and 29 + * processing for GPU-accelerated applications by allowing memory sharing and 30 + * synchronization between the CPU's and GPU's virtual address spaces. 31 + * 32 + * Key GPU SVM Components: 33 + * - Notifiers: Notifiers: Used for tracking memory intervals and notifying the 34 + * GPU of changes, notifiers are sized based on a GPU SVM 35 + * initialization parameter, with a recommendation of 512M or 36 + * larger. They maintain a Red-BlacK tree and a list of ranges that 37 + * fall within the notifier interval. Notifiers are tracked within 38 + * a GPU SVM Red-BlacK tree and list and are dynamically inserted 39 + * or removed as ranges within the interval are created or 40 + * destroyed. 41 + * - Ranges: Represent memory ranges mapped in a DRM device and managed 42 + * by GPU SVM. They are sized based on an array of chunk sizes, which 43 + * is a GPU SVM initialization parameter, and the CPU address space. 44 + * Upon GPU fault, the largest aligned chunk that fits within the 45 + * faulting CPU address space is chosen for the range size. Ranges are 46 + * expected to be dynamically allocated on GPU fault and removed on an 47 + * MMU notifier UNMAP event. As mentioned above, ranges are tracked in 48 + * a notifier's Red-Black tree. 49 + * - Operations: Define the interface for driver-specific GPU SVM operations 50 + * such as range allocation, notifier allocation, and 51 + * invalidations. 52 + * - Device Memory Allocations: Embedded structure containing enough information 53 + * for GPU SVM to migrate to / from device memory. 54 + * - Device Memory Operations: Define the interface for driver-specific device 55 + * memory operations release memory, populate pfns, 56 + * and copy to / from device memory. 57 + * 58 + * This layer provides interfaces for allocating, mapping, migrating, and 59 + * releasing memory ranges between the CPU and GPU. It handles all core memory 60 + * management interactions (DMA mapping, HMM, and migration) and provides 61 + * driver-specific virtual functions (vfuncs). This infrastructure is sufficient 62 + * to build the expected driver components for an SVM implementation as detailed 63 + * below. 64 + * 65 + * Expected Driver Components: 66 + * - GPU page fault handler: Used to create ranges and notifiers based on the 67 + * fault address, optionally migrate the range to 68 + * device memory, and create GPU bindings. 69 + * - Garbage collector: Used to unmap and destroy GPU bindings for ranges. 70 + * Ranges are expected to be added to the garbage collector 71 + * upon a MMU_NOTIFY_UNMAP event in notifier callback. 72 + * - Notifier callback: Used to invalidate and DMA unmap GPU bindings for 73 + * ranges. 74 + */ 75 + 76 + /** 77 + * DOC: Locking 78 + * 79 + * GPU SVM handles locking for core MM interactions, i.e., it locks/unlocks the 80 + * mmap lock as needed. 81 + * 82 + * GPU SVM introduces a global notifier lock, which safeguards the notifier's 83 + * range RB tree and list, as well as the range's DMA mappings and sequence 84 + * number. GPU SVM manages all necessary locking and unlocking operations, 85 + * except for the recheck range's pages being valid 86 + * (drm_gpusvm_range_pages_valid) when the driver is committing GPU bindings. This 87 + * lock corresponds to the 'driver->update' lock mentioned in the HMM 88 + * documentation (TODO: Link). Future revisions may transition from a GPU SVM 89 + * global lock to a per-notifier lock if finer-grained locking is deemed 90 + * necessary. 91 + * 92 + * In addition to the locking mentioned above, the driver should implement a 93 + * lock to safeguard core GPU SVM function calls that modify state, such as 94 + * drm_gpusvm_range_find_or_insert and drm_gpusvm_range_remove. This lock is 95 + * denoted as 'driver_svm_lock' in code examples. Finer grained driver side 96 + * locking should also be possible for concurrent GPU fault processing within a 97 + * single GPU SVM. The 'driver_svm_lock' can be via drm_gpusvm_driver_set_lock 98 + * to add annotations to GPU SVM. 99 + */ 100 + 101 + /** 102 + * DOC: Migration 103 + * 104 + * The migration support is quite simple, allowing migration between RAM and 105 + * device memory at the range granularity. For example, GPU SVM currently does not 106 + * support mixing RAM and device memory pages within a range. This means that upon GPU 107 + * fault, the entire range can be migrated to device memory, and upon CPU fault, the 108 + * entire range is migrated to RAM. Mixed RAM and device memory storage within a range 109 + * could be added in the future if required. 110 + * 111 + * The reasoning for only supporting range granularity is as follows: it 112 + * simplifies the implementation, and range sizes are driver-defined and should 113 + * be relatively small. 114 + */ 115 + 116 + /** 117 + * DOC: Partial Unmapping of Ranges 118 + * 119 + * Partial unmapping of ranges (e.g., 1M out of 2M is unmapped by CPU resulting 120 + * in MMU_NOTIFY_UNMAP event) presents several challenges, with the main one 121 + * being that a subset of the range still has CPU and GPU mappings. If the 122 + * backing store for the range is in device memory, a subset of the backing store has 123 + * references. One option would be to split the range and device memory backing store, 124 + * but the implementation for this would be quite complicated. Given that 125 + * partial unmappings are rare and driver-defined range sizes are relatively 126 + * small, GPU SVM does not support splitting of ranges. 127 + * 128 + * With no support for range splitting, upon partial unmapping of a range, the 129 + * driver is expected to invalidate and destroy the entire range. If the range 130 + * has device memory as its backing, the driver is also expected to migrate any 131 + * remaining pages back to RAM. 132 + */ 133 + 134 + /** 135 + * DOC: Examples 136 + * 137 + * This section provides three examples of how to build the expected driver 138 + * components: the GPU page fault handler, the garbage collector, and the 139 + * notifier callback. 140 + * 141 + * The generic code provided does not include logic for complex migration 142 + * policies, optimized invalidations, fined grained driver locking, or other 143 + * potentially required driver locking (e.g., DMA-resv locks). 144 + * 145 + * 1) GPU page fault handler 146 + * 147 + * int driver_bind_range(struct drm_gpusvm *gpusvm, struct drm_gpusvm_range *range) 148 + * { 149 + * int err = 0; 150 + * 151 + * driver_alloc_and_setup_memory_for_bind(gpusvm, range); 152 + * 153 + * drm_gpusvm_notifier_lock(gpusvm); 154 + * if (drm_gpusvm_range_pages_valid(range)) 155 + * driver_commit_bind(gpusvm, range); 156 + * else 157 + * err = -EAGAIN; 158 + * drm_gpusvm_notifier_unlock(gpusvm); 159 + * 160 + * return err; 161 + * } 162 + * 163 + * int driver_gpu_fault(struct drm_gpusvm *gpusvm, unsigned long fault_addr, 164 + * unsigned long gpuva_start, unsigned long gpuva_end) 165 + * { 166 + * struct drm_gpusvm_ctx ctx = {}; 167 + * int err; 168 + * 169 + * driver_svm_lock(); 170 + * retry: 171 + * // Always process UNMAPs first so view of GPU SVM ranges is current 172 + * driver_garbage_collector(gpusvm); 173 + * 174 + * range = drm_gpusvm_range_find_or_insert(gpusvm, fault_addr, 175 + * gpuva_start, gpuva_end, 176 + * &ctx); 177 + * if (IS_ERR(range)) { 178 + * err = PTR_ERR(range); 179 + * goto unlock; 180 + * } 181 + * 182 + * if (driver_migration_policy(range)) { 183 + * mmap_read_lock(mm); 184 + * devmem = driver_alloc_devmem(); 185 + * err = drm_gpusvm_migrate_to_devmem(gpusvm, range, 186 + * devmem_allocation, 187 + * &ctx); 188 + * mmap_read_unlock(mm); 189 + * if (err) // CPU mappings may have changed 190 + * goto retry; 191 + * } 192 + * 193 + * err = drm_gpusvm_range_get_pages(gpusvm, range, &ctx); 194 + * if (err == -EOPNOTSUPP || err == -EFAULT || err == -EPERM) { // CPU mappings changed 195 + * if (err == -EOPNOTSUPP) 196 + * drm_gpusvm_range_evict(gpusvm, range); 197 + * goto retry; 198 + * } else if (err) { 199 + * goto unlock; 200 + * } 201 + * 202 + * err = driver_bind_range(gpusvm, range); 203 + * if (err == -EAGAIN) // CPU mappings changed 204 + * goto retry 205 + * 206 + * unlock: 207 + * driver_svm_unlock(); 208 + * return err; 209 + * } 210 + * 211 + * 2) Garbage Collector. 212 + * 213 + * void __driver_garbage_collector(struct drm_gpusvm *gpusvm, 214 + * struct drm_gpusvm_range *range) 215 + * { 216 + * assert_driver_svm_locked(gpusvm); 217 + * 218 + * // Partial unmap, migrate any remaining device memory pages back to RAM 219 + * if (range->flags.partial_unmap) 220 + * drm_gpusvm_range_evict(gpusvm, range); 221 + * 222 + * driver_unbind_range(range); 223 + * drm_gpusvm_range_remove(gpusvm, range); 224 + * } 225 + * 226 + * void driver_garbage_collector(struct drm_gpusvm *gpusvm) 227 + * { 228 + * assert_driver_svm_locked(gpusvm); 229 + * 230 + * for_each_range_in_garbage_collector(gpusvm, range) 231 + * __driver_garbage_collector(gpusvm, range); 232 + * } 233 + * 234 + * 3) Notifier callback. 235 + * 236 + * void driver_invalidation(struct drm_gpusvm *gpusvm, 237 + * struct drm_gpusvm_notifier *notifier, 238 + * const struct mmu_notifier_range *mmu_range) 239 + * { 240 + * struct drm_gpusvm_ctx ctx = { .in_notifier = true, }; 241 + * struct drm_gpusvm_range *range = NULL; 242 + * 243 + * driver_invalidate_device_pages(gpusvm, mmu_range->start, mmu_range->end); 244 + * 245 + * drm_gpusvm_for_each_range(range, notifier, mmu_range->start, 246 + * mmu_range->end) { 247 + * drm_gpusvm_range_unmap_pages(gpusvm, range, &ctx); 248 + * 249 + * if (mmu_range->event != MMU_NOTIFY_UNMAP) 250 + * continue; 251 + * 252 + * drm_gpusvm_range_set_unmapped(range, mmu_range); 253 + * driver_garbage_collector_add(gpusvm, range); 254 + * } 255 + * } 256 + */ 257 + 258 + /** 259 + * npages_in_range() - Calculate the number of pages in a given range 260 + * @start: The start address of the range 261 + * @end: The end address of the range 262 + * 263 + * This macro calculates the number of pages in a given memory range, 264 + * specified by the start and end addresses. It divides the difference 265 + * between the end and start addresses by the page size (PAGE_SIZE) to 266 + * determine the number of pages in the range. 267 + * 268 + * Return: The number of pages in the specified range. 269 + */ 270 + static unsigned long 271 + npages_in_range(unsigned long start, unsigned long end) 272 + { 273 + return (end - start) >> PAGE_SHIFT; 274 + } 275 + 276 + /** 277 + * struct drm_gpusvm_zdd - GPU SVM zone device data 278 + * 279 + * @refcount: Reference count for the zdd 280 + * @devmem_allocation: device memory allocation 281 + * @device_private_page_owner: Device private pages owner 282 + * 283 + * This structure serves as a generic wrapper installed in 284 + * page->zone_device_data. It provides infrastructure for looking up a device 285 + * memory allocation upon CPU page fault and asynchronously releasing device 286 + * memory once the CPU has no page references. Asynchronous release is useful 287 + * because CPU page references can be dropped in IRQ contexts, while releasing 288 + * device memory likely requires sleeping locks. 289 + */ 290 + struct drm_gpusvm_zdd { 291 + struct kref refcount; 292 + struct drm_gpusvm_devmem *devmem_allocation; 293 + void *device_private_page_owner; 294 + }; 295 + 296 + /** 297 + * drm_gpusvm_zdd_alloc() - Allocate a zdd structure. 298 + * @device_private_page_owner: Device private pages owner 299 + * 300 + * This function allocates and initializes a new zdd structure. It sets up the 301 + * reference count and initializes the destroy work. 302 + * 303 + * Return: Pointer to the allocated zdd on success, ERR_PTR() on failure. 304 + */ 305 + static struct drm_gpusvm_zdd * 306 + drm_gpusvm_zdd_alloc(void *device_private_page_owner) 307 + { 308 + struct drm_gpusvm_zdd *zdd; 309 + 310 + zdd = kmalloc(sizeof(*zdd), GFP_KERNEL); 311 + if (!zdd) 312 + return NULL; 313 + 314 + kref_init(&zdd->refcount); 315 + zdd->devmem_allocation = NULL; 316 + zdd->device_private_page_owner = device_private_page_owner; 317 + 318 + return zdd; 319 + } 320 + 321 + /** 322 + * drm_gpusvm_zdd_get() - Get a reference to a zdd structure. 323 + * @zdd: Pointer to the zdd structure. 324 + * 325 + * This function increments the reference count of the provided zdd structure. 326 + * 327 + * Return: Pointer to the zdd structure. 328 + */ 329 + static struct drm_gpusvm_zdd *drm_gpusvm_zdd_get(struct drm_gpusvm_zdd *zdd) 330 + { 331 + kref_get(&zdd->refcount); 332 + return zdd; 333 + } 334 + 335 + /** 336 + * drm_gpusvm_zdd_destroy() - Destroy a zdd structure. 337 + * @ref: Pointer to the reference count structure. 338 + * 339 + * This function queues the destroy_work of the zdd for asynchronous destruction. 340 + */ 341 + static void drm_gpusvm_zdd_destroy(struct kref *ref) 342 + { 343 + struct drm_gpusvm_zdd *zdd = 344 + container_of(ref, struct drm_gpusvm_zdd, refcount); 345 + struct drm_gpusvm_devmem *devmem = zdd->devmem_allocation; 346 + 347 + if (devmem) { 348 + complete_all(&devmem->detached); 349 + if (devmem->ops->devmem_release) 350 + devmem->ops->devmem_release(devmem); 351 + } 352 + kfree(zdd); 353 + } 354 + 355 + /** 356 + * drm_gpusvm_zdd_put() - Put a zdd reference. 357 + * @zdd: Pointer to the zdd structure. 358 + * 359 + * This function decrements the reference count of the provided zdd structure 360 + * and schedules its destruction if the count drops to zero. 361 + */ 362 + static void drm_gpusvm_zdd_put(struct drm_gpusvm_zdd *zdd) 363 + { 364 + kref_put(&zdd->refcount, drm_gpusvm_zdd_destroy); 365 + } 366 + 367 + /** 368 + * drm_gpusvm_range_find() - Find GPU SVM range from GPU SVM notifier 369 + * @notifier: Pointer to the GPU SVM notifier structure. 370 + * @start: Start address of the range 371 + * @end: End address of the range 372 + * 373 + * Return: A pointer to the drm_gpusvm_range if found or NULL 374 + */ 375 + struct drm_gpusvm_range * 376 + drm_gpusvm_range_find(struct drm_gpusvm_notifier *notifier, unsigned long start, 377 + unsigned long end) 378 + { 379 + struct interval_tree_node *itree; 380 + 381 + itree = interval_tree_iter_first(&notifier->root, start, end - 1); 382 + 383 + if (itree) 384 + return container_of(itree, struct drm_gpusvm_range, itree); 385 + else 386 + return NULL; 387 + } 388 + EXPORT_SYMBOL_GPL(drm_gpusvm_range_find); 389 + 390 + /** 391 + * drm_gpusvm_for_each_range_safe() - Safely iterate over GPU SVM ranges in a notifier 392 + * @range__: Iterator variable for the ranges 393 + * @next__: Iterator variable for the ranges temporay storage 394 + * @notifier__: Pointer to the GPU SVM notifier 395 + * @start__: Start address of the range 396 + * @end__: End address of the range 397 + * 398 + * This macro is used to iterate over GPU SVM ranges in a notifier while 399 + * removing ranges from it. 400 + */ 401 + #define drm_gpusvm_for_each_range_safe(range__, next__, notifier__, start__, end__) \ 402 + for ((range__) = drm_gpusvm_range_find((notifier__), (start__), (end__)), \ 403 + (next__) = __drm_gpusvm_range_next(range__); \ 404 + (range__) && (drm_gpusvm_range_start(range__) < (end__)); \ 405 + (range__) = (next__), (next__) = __drm_gpusvm_range_next(range__)) 406 + 407 + /** 408 + * __drm_gpusvm_notifier_next() - get the next drm_gpusvm_notifier in the list 409 + * @notifier: a pointer to the current drm_gpusvm_notifier 410 + * 411 + * Return: A pointer to the next drm_gpusvm_notifier if available, or NULL if 412 + * the current notifier is the last one or if the input notifier is 413 + * NULL. 414 + */ 415 + static struct drm_gpusvm_notifier * 416 + __drm_gpusvm_notifier_next(struct drm_gpusvm_notifier *notifier) 417 + { 418 + if (notifier && !list_is_last(&notifier->entry, 419 + &notifier->gpusvm->notifier_list)) 420 + return list_next_entry(notifier, entry); 421 + 422 + return NULL; 423 + } 424 + 425 + static struct drm_gpusvm_notifier * 426 + notifier_iter_first(struct rb_root_cached *root, unsigned long start, 427 + unsigned long last) 428 + { 429 + struct interval_tree_node *itree; 430 + 431 + itree = interval_tree_iter_first(root, start, last); 432 + 433 + if (itree) 434 + return container_of(itree, struct drm_gpusvm_notifier, itree); 435 + else 436 + return NULL; 437 + } 438 + 439 + /** 440 + * drm_gpusvm_for_each_notifier() - Iterate over GPU SVM notifiers in a gpusvm 441 + * @notifier__: Iterator variable for the notifiers 442 + * @notifier__: Pointer to the GPU SVM notifier 443 + * @start__: Start address of the notifier 444 + * @end__: End address of the notifier 445 + * 446 + * This macro is used to iterate over GPU SVM notifiers in a gpusvm. 447 + */ 448 + #define drm_gpusvm_for_each_notifier(notifier__, gpusvm__, start__, end__) \ 449 + for ((notifier__) = notifier_iter_first(&(gpusvm__)->root, (start__), (end__) - 1); \ 450 + (notifier__) && (drm_gpusvm_notifier_start(notifier__) < (end__)); \ 451 + (notifier__) = __drm_gpusvm_notifier_next(notifier__)) 452 + 453 + /** 454 + * drm_gpusvm_for_each_notifier_safe() - Safely iterate over GPU SVM notifiers in a gpusvm 455 + * @notifier__: Iterator variable for the notifiers 456 + * @next__: Iterator variable for the notifiers temporay storage 457 + * @notifier__: Pointer to the GPU SVM notifier 458 + * @start__: Start address of the notifier 459 + * @end__: End address of the notifier 460 + * 461 + * This macro is used to iterate over GPU SVM notifiers in a gpusvm while 462 + * removing notifiers from it. 463 + */ 464 + #define drm_gpusvm_for_each_notifier_safe(notifier__, next__, gpusvm__, start__, end__) \ 465 + for ((notifier__) = notifier_iter_first(&(gpusvm__)->root, (start__), (end__) - 1), \ 466 + (next__) = __drm_gpusvm_notifier_next(notifier__); \ 467 + (notifier__) && (drm_gpusvm_notifier_start(notifier__) < (end__)); \ 468 + (notifier__) = (next__), (next__) = __drm_gpusvm_notifier_next(notifier__)) 469 + 470 + /** 471 + * drm_gpusvm_notifier_invalidate() - Invalidate a GPU SVM notifier. 472 + * @mni: Pointer to the mmu_interval_notifier structure. 473 + * @mmu_range: Pointer to the mmu_notifier_range structure. 474 + * @cur_seq: Current sequence number. 475 + * 476 + * This function serves as a generic MMU notifier for GPU SVM. It sets the MMU 477 + * notifier sequence number and calls the driver invalidate vfunc under 478 + * gpusvm->notifier_lock. 479 + * 480 + * Return: true if the operation succeeds, false otherwise. 481 + */ 482 + static bool 483 + drm_gpusvm_notifier_invalidate(struct mmu_interval_notifier *mni, 484 + const struct mmu_notifier_range *mmu_range, 485 + unsigned long cur_seq) 486 + { 487 + struct drm_gpusvm_notifier *notifier = 488 + container_of(mni, typeof(*notifier), notifier); 489 + struct drm_gpusvm *gpusvm = notifier->gpusvm; 490 + 491 + if (!mmu_notifier_range_blockable(mmu_range)) 492 + return false; 493 + 494 + down_write(&gpusvm->notifier_lock); 495 + mmu_interval_set_seq(mni, cur_seq); 496 + gpusvm->ops->invalidate(gpusvm, notifier, mmu_range); 497 + up_write(&gpusvm->notifier_lock); 498 + 499 + return true; 500 + } 501 + 502 + /** 503 + * drm_gpusvm_notifier_ops - MMU interval notifier operations for GPU SVM 504 + */ 505 + static const struct mmu_interval_notifier_ops drm_gpusvm_notifier_ops = { 506 + .invalidate = drm_gpusvm_notifier_invalidate, 507 + }; 508 + 509 + /** 510 + * drm_gpusvm_init() - Initialize the GPU SVM. 511 + * @gpusvm: Pointer to the GPU SVM structure. 512 + * @name: Name of the GPU SVM. 513 + * @drm: Pointer to the DRM device structure. 514 + * @mm: Pointer to the mm_struct for the address space. 515 + * @device_private_page_owner: Device private pages owner. 516 + * @mm_start: Start address of GPU SVM. 517 + * @mm_range: Range of the GPU SVM. 518 + * @notifier_size: Size of individual notifiers. 519 + * @ops: Pointer to the operations structure for GPU SVM. 520 + * @chunk_sizes: Pointer to the array of chunk sizes used in range allocation. 521 + * Entries should be powers of 2 in descending order with last 522 + * entry being SZ_4K. 523 + * @num_chunks: Number of chunks. 524 + * 525 + * This function initializes the GPU SVM. 526 + * 527 + * Return: 0 on success, a negative error code on failure. 528 + */ 529 + int drm_gpusvm_init(struct drm_gpusvm *gpusvm, 530 + const char *name, struct drm_device *drm, 531 + struct mm_struct *mm, void *device_private_page_owner, 532 + unsigned long mm_start, unsigned long mm_range, 533 + unsigned long notifier_size, 534 + const struct drm_gpusvm_ops *ops, 535 + const unsigned long *chunk_sizes, int num_chunks) 536 + { 537 + if (!ops->invalidate || !num_chunks) 538 + return -EINVAL; 539 + 540 + gpusvm->name = name; 541 + gpusvm->drm = drm; 542 + gpusvm->mm = mm; 543 + gpusvm->device_private_page_owner = device_private_page_owner; 544 + gpusvm->mm_start = mm_start; 545 + gpusvm->mm_range = mm_range; 546 + gpusvm->notifier_size = notifier_size; 547 + gpusvm->ops = ops; 548 + gpusvm->chunk_sizes = chunk_sizes; 549 + gpusvm->num_chunks = num_chunks; 550 + 551 + mmgrab(mm); 552 + gpusvm->root = RB_ROOT_CACHED; 553 + INIT_LIST_HEAD(&gpusvm->notifier_list); 554 + 555 + init_rwsem(&gpusvm->notifier_lock); 556 + 557 + fs_reclaim_acquire(GFP_KERNEL); 558 + might_lock(&gpusvm->notifier_lock); 559 + fs_reclaim_release(GFP_KERNEL); 560 + 561 + #ifdef CONFIG_LOCKDEP 562 + gpusvm->lock_dep_map = NULL; 563 + #endif 564 + 565 + return 0; 566 + } 567 + EXPORT_SYMBOL_GPL(drm_gpusvm_init); 568 + 569 + /** 570 + * drm_gpusvm_notifier_find() - Find GPU SVM notifier 571 + * @gpusvm: Pointer to the GPU SVM structure 572 + * @fault_addr: Fault address 573 + * 574 + * This function finds the GPU SVM notifier associated with the fault address. 575 + * 576 + * Return: Pointer to the GPU SVM notifier on success, NULL otherwise. 577 + */ 578 + static struct drm_gpusvm_notifier * 579 + drm_gpusvm_notifier_find(struct drm_gpusvm *gpusvm, 580 + unsigned long fault_addr) 581 + { 582 + return notifier_iter_first(&gpusvm->root, fault_addr, fault_addr + 1); 583 + } 584 + 585 + /** 586 + * to_drm_gpusvm_notifier() - retrieve the container struct for a given rbtree node 587 + * @node: a pointer to the rbtree node embedded within a drm_gpusvm_notifier struct 588 + * 589 + * Return: A pointer to the containing drm_gpusvm_notifier structure. 590 + */ 591 + static struct drm_gpusvm_notifier *to_drm_gpusvm_notifier(struct rb_node *node) 592 + { 593 + return container_of(node, struct drm_gpusvm_notifier, itree.rb); 594 + } 595 + 596 + /** 597 + * drm_gpusvm_notifier_insert() - Insert GPU SVM notifier 598 + * @gpusvm: Pointer to the GPU SVM structure 599 + * @notifier: Pointer to the GPU SVM notifier structure 600 + * 601 + * This function inserts the GPU SVM notifier into the GPU SVM RB tree and list. 602 + */ 603 + static void drm_gpusvm_notifier_insert(struct drm_gpusvm *gpusvm, 604 + struct drm_gpusvm_notifier *notifier) 605 + { 606 + struct rb_node *node; 607 + struct list_head *head; 608 + 609 + interval_tree_insert(&notifier->itree, &gpusvm->root); 610 + 611 + node = rb_prev(&notifier->itree.rb); 612 + if (node) 613 + head = &(to_drm_gpusvm_notifier(node))->entry; 614 + else 615 + head = &gpusvm->notifier_list; 616 + 617 + list_add(&notifier->entry, head); 618 + } 619 + 620 + /** 621 + * drm_gpusvm_notifier_remove() - Remove GPU SVM notifier 622 + * @gpusvm: Pointer to the GPU SVM tructure 623 + * @notifier: Pointer to the GPU SVM notifier structure 624 + * 625 + * This function removes the GPU SVM notifier from the GPU SVM RB tree and list. 626 + */ 627 + static void drm_gpusvm_notifier_remove(struct drm_gpusvm *gpusvm, 628 + struct drm_gpusvm_notifier *notifier) 629 + { 630 + interval_tree_remove(&notifier->itree, &gpusvm->root); 631 + list_del(&notifier->entry); 632 + } 633 + 634 + /** 635 + * drm_gpusvm_fini() - Finalize the GPU SVM. 636 + * @gpusvm: Pointer to the GPU SVM structure. 637 + * 638 + * This function finalizes the GPU SVM by cleaning up any remaining ranges and 639 + * notifiers, and dropping a reference to struct MM. 640 + */ 641 + void drm_gpusvm_fini(struct drm_gpusvm *gpusvm) 642 + { 643 + struct drm_gpusvm_notifier *notifier, *next; 644 + 645 + drm_gpusvm_for_each_notifier_safe(notifier, next, gpusvm, 0, LONG_MAX) { 646 + struct drm_gpusvm_range *range, *__next; 647 + 648 + /* 649 + * Remove notifier first to avoid racing with any invalidation 650 + */ 651 + mmu_interval_notifier_remove(&notifier->notifier); 652 + notifier->flags.removed = true; 653 + 654 + drm_gpusvm_for_each_range_safe(range, __next, notifier, 0, 655 + LONG_MAX) 656 + drm_gpusvm_range_remove(gpusvm, range); 657 + } 658 + 659 + mmdrop(gpusvm->mm); 660 + WARN_ON(!RB_EMPTY_ROOT(&gpusvm->root.rb_root)); 661 + } 662 + EXPORT_SYMBOL_GPL(drm_gpusvm_fini); 663 + 664 + /** 665 + * drm_gpusvm_notifier_alloc() - Allocate GPU SVM notifier 666 + * @gpusvm: Pointer to the GPU SVM structure 667 + * @fault_addr: Fault address 668 + * 669 + * This function allocates and initializes the GPU SVM notifier structure. 670 + * 671 + * Return: Pointer to the allocated GPU SVM notifier on success, ERR_PTR() on failure. 672 + */ 673 + static struct drm_gpusvm_notifier * 674 + drm_gpusvm_notifier_alloc(struct drm_gpusvm *gpusvm, unsigned long fault_addr) 675 + { 676 + struct drm_gpusvm_notifier *notifier; 677 + 678 + if (gpusvm->ops->notifier_alloc) 679 + notifier = gpusvm->ops->notifier_alloc(); 680 + else 681 + notifier = kzalloc(sizeof(*notifier), GFP_KERNEL); 682 + 683 + if (!notifier) 684 + return ERR_PTR(-ENOMEM); 685 + 686 + notifier->gpusvm = gpusvm; 687 + notifier->itree.start = ALIGN_DOWN(fault_addr, gpusvm->notifier_size); 688 + notifier->itree.last = ALIGN(fault_addr + 1, gpusvm->notifier_size) - 1; 689 + INIT_LIST_HEAD(&notifier->entry); 690 + notifier->root = RB_ROOT_CACHED; 691 + INIT_LIST_HEAD(&notifier->range_list); 692 + 693 + return notifier; 694 + } 695 + 696 + /** 697 + * drm_gpusvm_notifier_free() - Free GPU SVM notifier 698 + * @gpusvm: Pointer to the GPU SVM structure 699 + * @notifier: Pointer to the GPU SVM notifier structure 700 + * 701 + * This function frees the GPU SVM notifier structure. 702 + */ 703 + static void drm_gpusvm_notifier_free(struct drm_gpusvm *gpusvm, 704 + struct drm_gpusvm_notifier *notifier) 705 + { 706 + WARN_ON(!RB_EMPTY_ROOT(&notifier->root.rb_root)); 707 + 708 + if (gpusvm->ops->notifier_free) 709 + gpusvm->ops->notifier_free(notifier); 710 + else 711 + kfree(notifier); 712 + } 713 + 714 + /** 715 + * to_drm_gpusvm_range() - retrieve the container struct for a given rbtree node 716 + * @node: a pointer to the rbtree node embedded within a drm_gpusvm_range struct 717 + * 718 + * Return: A pointer to the containing drm_gpusvm_range structure. 719 + */ 720 + static struct drm_gpusvm_range *to_drm_gpusvm_range(struct rb_node *node) 721 + { 722 + return container_of(node, struct drm_gpusvm_range, itree.rb); 723 + } 724 + 725 + /** 726 + * drm_gpusvm_range_insert() - Insert GPU SVM range 727 + * @notifier: Pointer to the GPU SVM notifier structure 728 + * @range: Pointer to the GPU SVM range structure 729 + * 730 + * This function inserts the GPU SVM range into the notifier RB tree and list. 731 + */ 732 + static void drm_gpusvm_range_insert(struct drm_gpusvm_notifier *notifier, 733 + struct drm_gpusvm_range *range) 734 + { 735 + struct rb_node *node; 736 + struct list_head *head; 737 + 738 + drm_gpusvm_notifier_lock(notifier->gpusvm); 739 + interval_tree_insert(&range->itree, &notifier->root); 740 + 741 + node = rb_prev(&range->itree.rb); 742 + if (node) 743 + head = &(to_drm_gpusvm_range(node))->entry; 744 + else 745 + head = &notifier->range_list; 746 + 747 + list_add(&range->entry, head); 748 + drm_gpusvm_notifier_unlock(notifier->gpusvm); 749 + } 750 + 751 + /** 752 + * __drm_gpusvm_range_remove() - Remove GPU SVM range 753 + * @notifier: Pointer to the GPU SVM notifier structure 754 + * @range: Pointer to the GPU SVM range structure 755 + * 756 + * This macro removes the GPU SVM range from the notifier RB tree and list. 757 + */ 758 + static void __drm_gpusvm_range_remove(struct drm_gpusvm_notifier *notifier, 759 + struct drm_gpusvm_range *range) 760 + { 761 + interval_tree_remove(&range->itree, &notifier->root); 762 + list_del(&range->entry); 763 + } 764 + 765 + /** 766 + * drm_gpusvm_range_alloc() - Allocate GPU SVM range 767 + * @gpusvm: Pointer to the GPU SVM structure 768 + * @notifier: Pointer to the GPU SVM notifier structure 769 + * @fault_addr: Fault address 770 + * @chunk_size: Chunk size 771 + * @migrate_devmem: Flag indicating whether to migrate device memory 772 + * 773 + * This function allocates and initializes the GPU SVM range structure. 774 + * 775 + * Return: Pointer to the allocated GPU SVM range on success, ERR_PTR() on failure. 776 + */ 777 + static struct drm_gpusvm_range * 778 + drm_gpusvm_range_alloc(struct drm_gpusvm *gpusvm, 779 + struct drm_gpusvm_notifier *notifier, 780 + unsigned long fault_addr, unsigned long chunk_size, 781 + bool migrate_devmem) 782 + { 783 + struct drm_gpusvm_range *range; 784 + 785 + if (gpusvm->ops->range_alloc) 786 + range = gpusvm->ops->range_alloc(gpusvm); 787 + else 788 + range = kzalloc(sizeof(*range), GFP_KERNEL); 789 + 790 + if (!range) 791 + return ERR_PTR(-ENOMEM); 792 + 793 + kref_init(&range->refcount); 794 + range->gpusvm = gpusvm; 795 + range->notifier = notifier; 796 + range->itree.start = ALIGN_DOWN(fault_addr, chunk_size); 797 + range->itree.last = ALIGN(fault_addr + 1, chunk_size) - 1; 798 + INIT_LIST_HEAD(&range->entry); 799 + range->notifier_seq = LONG_MAX; 800 + range->flags.migrate_devmem = migrate_devmem ? 1 : 0; 801 + 802 + return range; 803 + } 804 + 805 + /** 806 + * drm_gpusvm_check_pages() - Check pages 807 + * @gpusvm: Pointer to the GPU SVM structure 808 + * @notifier: Pointer to the GPU SVM notifier structure 809 + * @start: Start address 810 + * @end: End address 811 + * 812 + * Check if pages between start and end have been faulted in on the CPU. Use to 813 + * prevent migration of pages without CPU backing store. 814 + * 815 + * Return: True if pages have been faulted into CPU, False otherwise 816 + */ 817 + static bool drm_gpusvm_check_pages(struct drm_gpusvm *gpusvm, 818 + struct drm_gpusvm_notifier *notifier, 819 + unsigned long start, unsigned long end) 820 + { 821 + struct hmm_range hmm_range = { 822 + .default_flags = 0, 823 + .notifier = &notifier->notifier, 824 + .start = start, 825 + .end = end, 826 + .dev_private_owner = gpusvm->device_private_page_owner, 827 + }; 828 + unsigned long timeout = 829 + jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); 830 + unsigned long *pfns; 831 + unsigned long npages = npages_in_range(start, end); 832 + int err, i; 833 + 834 + mmap_assert_locked(gpusvm->mm); 835 + 836 + pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL); 837 + if (!pfns) 838 + return false; 839 + 840 + hmm_range.notifier_seq = mmu_interval_read_begin(&notifier->notifier); 841 + hmm_range.hmm_pfns = pfns; 842 + 843 + while (true) { 844 + err = hmm_range_fault(&hmm_range); 845 + if (err == -EBUSY) { 846 + if (time_after(jiffies, timeout)) 847 + break; 848 + 849 + hmm_range.notifier_seq = 850 + mmu_interval_read_begin(&notifier->notifier); 851 + continue; 852 + } 853 + break; 854 + } 855 + if (err) 856 + goto err_free; 857 + 858 + for (i = 0; i < npages;) { 859 + if (!(pfns[i] & HMM_PFN_VALID)) { 860 + err = -EFAULT; 861 + goto err_free; 862 + } 863 + i += 0x1 << hmm_pfn_to_map_order(pfns[i]); 864 + } 865 + 866 + err_free: 867 + kvfree(pfns); 868 + return err ? false : true; 869 + } 870 + 871 + /** 872 + * drm_gpusvm_range_chunk_size() - Determine chunk size for GPU SVM range 873 + * @gpusvm: Pointer to the GPU SVM structure 874 + * @notifier: Pointer to the GPU SVM notifier structure 875 + * @vas: Pointer to the virtual memory area structure 876 + * @fault_addr: Fault address 877 + * @gpuva_start: Start address of GPUVA which mirrors CPU 878 + * @gpuva_end: End address of GPUVA which mirrors CPU 879 + * @check_pages_threshold: Check CPU pages for present threshold 880 + * 881 + * This function determines the chunk size for the GPU SVM range based on the 882 + * fault address, GPU SVM chunk sizes, existing GPU SVM ranges, and the virtual 883 + * memory area boundaries. 884 + * 885 + * Return: Chunk size on success, LONG_MAX on failure. 886 + */ 887 + static unsigned long 888 + drm_gpusvm_range_chunk_size(struct drm_gpusvm *gpusvm, 889 + struct drm_gpusvm_notifier *notifier, 890 + struct vm_area_struct *vas, 891 + unsigned long fault_addr, 892 + unsigned long gpuva_start, 893 + unsigned long gpuva_end, 894 + unsigned long check_pages_threshold) 895 + { 896 + unsigned long start, end; 897 + int i = 0; 898 + 899 + retry: 900 + for (; i < gpusvm->num_chunks; ++i) { 901 + start = ALIGN_DOWN(fault_addr, gpusvm->chunk_sizes[i]); 902 + end = ALIGN(fault_addr + 1, gpusvm->chunk_sizes[i]); 903 + 904 + if (start >= vas->vm_start && end <= vas->vm_end && 905 + start >= drm_gpusvm_notifier_start(notifier) && 906 + end <= drm_gpusvm_notifier_end(notifier) && 907 + start >= gpuva_start && end <= gpuva_end) 908 + break; 909 + } 910 + 911 + if (i == gpusvm->num_chunks) 912 + return LONG_MAX; 913 + 914 + /* 915 + * If allocation more than page, ensure not to overlap with existing 916 + * ranges. 917 + */ 918 + if (end - start != SZ_4K) { 919 + struct drm_gpusvm_range *range; 920 + 921 + range = drm_gpusvm_range_find(notifier, start, end); 922 + if (range) { 923 + ++i; 924 + goto retry; 925 + } 926 + 927 + /* 928 + * XXX: Only create range on pages CPU has faulted in. Without 929 + * this check, or prefault, on BMG 'xe_exec_system_allocator --r 930 + * process-many-malloc' fails. In the failure case, each process 931 + * mallocs 16k but the CPU VMA is ~128k which results in 64k SVM 932 + * ranges. When migrating the SVM ranges, some processes fail in 933 + * drm_gpusvm_migrate_to_devmem with 'migrate.cpages != npages' 934 + * and then upon drm_gpusvm_range_get_pages device pages from 935 + * other processes are collected + faulted in which creates all 936 + * sorts of problems. Unsure exactly how this happening, also 937 + * problem goes away if 'xe_exec_system_allocator --r 938 + * process-many-malloc' mallocs at least 64k at a time. 939 + */ 940 + if (end - start <= check_pages_threshold && 941 + !drm_gpusvm_check_pages(gpusvm, notifier, start, end)) { 942 + ++i; 943 + goto retry; 944 + } 945 + } 946 + 947 + return end - start; 948 + } 949 + 950 + #ifdef CONFIG_LOCKDEP 951 + /** 952 + * drm_gpusvm_driver_lock_held() - Assert GPU SVM driver lock is held 953 + * @gpusvm: Pointer to the GPU SVM structure. 954 + * 955 + * Ensure driver lock is held. 956 + */ 957 + static void drm_gpusvm_driver_lock_held(struct drm_gpusvm *gpusvm) 958 + { 959 + if ((gpusvm)->lock_dep_map) 960 + lockdep_assert(lock_is_held_type((gpusvm)->lock_dep_map, 0)); 961 + } 962 + #else 963 + static void drm_gpusvm_driver_lock_held(struct drm_gpusvm *gpusvm) 964 + { 965 + } 966 + #endif 967 + 968 + /** 969 + * drm_gpusvm_range_find_or_insert() - Find or insert GPU SVM range 970 + * @gpusvm: Pointer to the GPU SVM structure 971 + * @fault_addr: Fault address 972 + * @gpuva_start: Start address of GPUVA which mirrors CPU 973 + * @gpuva_end: End address of GPUVA which mirrors CPU 974 + * @ctx: GPU SVM context 975 + * 976 + * This function finds or inserts a newly allocated a GPU SVM range based on the 977 + * fault address. Caller must hold a lock to protect range lookup and insertion. 978 + * 979 + * Return: Pointer to the GPU SVM range on success, ERR_PTR() on failure. 980 + */ 981 + struct drm_gpusvm_range * 982 + drm_gpusvm_range_find_or_insert(struct drm_gpusvm *gpusvm, 983 + unsigned long fault_addr, 984 + unsigned long gpuva_start, 985 + unsigned long gpuva_end, 986 + const struct drm_gpusvm_ctx *ctx) 987 + { 988 + struct drm_gpusvm_notifier *notifier; 989 + struct drm_gpusvm_range *range; 990 + struct mm_struct *mm = gpusvm->mm; 991 + struct vm_area_struct *vas; 992 + bool notifier_alloc = false; 993 + unsigned long chunk_size; 994 + int err; 995 + bool migrate_devmem; 996 + 997 + drm_gpusvm_driver_lock_held(gpusvm); 998 + 999 + if (fault_addr < gpusvm->mm_start || 1000 + fault_addr > gpusvm->mm_start + gpusvm->mm_range) 1001 + return ERR_PTR(-EINVAL); 1002 + 1003 + if (!mmget_not_zero(mm)) 1004 + return ERR_PTR(-EFAULT); 1005 + 1006 + notifier = drm_gpusvm_notifier_find(gpusvm, fault_addr); 1007 + if (!notifier) { 1008 + notifier = drm_gpusvm_notifier_alloc(gpusvm, fault_addr); 1009 + if (IS_ERR(notifier)) { 1010 + err = PTR_ERR(notifier); 1011 + goto err_mmunlock; 1012 + } 1013 + notifier_alloc = true; 1014 + err = mmu_interval_notifier_insert(&notifier->notifier, 1015 + mm, 1016 + drm_gpusvm_notifier_start(notifier), 1017 + drm_gpusvm_notifier_size(notifier), 1018 + &drm_gpusvm_notifier_ops); 1019 + if (err) 1020 + goto err_notifier; 1021 + } 1022 + 1023 + mmap_read_lock(mm); 1024 + 1025 + vas = vma_lookup(mm, fault_addr); 1026 + if (!vas) { 1027 + err = -ENOENT; 1028 + goto err_notifier_remove; 1029 + } 1030 + 1031 + if (!ctx->read_only && !(vas->vm_flags & VM_WRITE)) { 1032 + err = -EPERM; 1033 + goto err_notifier_remove; 1034 + } 1035 + 1036 + range = drm_gpusvm_range_find(notifier, fault_addr, fault_addr + 1); 1037 + if (range) 1038 + goto out_mmunlock; 1039 + /* 1040 + * XXX: Short-circuiting migration based on migrate_vma_* current 1041 + * limitations. If/when migrate_vma_* add more support, this logic will 1042 + * have to change. 1043 + */ 1044 + migrate_devmem = ctx->devmem_possible && 1045 + vma_is_anonymous(vas) && !is_vm_hugetlb_page(vas); 1046 + 1047 + chunk_size = drm_gpusvm_range_chunk_size(gpusvm, notifier, vas, 1048 + fault_addr, gpuva_start, 1049 + gpuva_end, 1050 + ctx->check_pages_threshold); 1051 + if (chunk_size == LONG_MAX) { 1052 + err = -EINVAL; 1053 + goto err_notifier_remove; 1054 + } 1055 + 1056 + range = drm_gpusvm_range_alloc(gpusvm, notifier, fault_addr, chunk_size, 1057 + migrate_devmem); 1058 + if (IS_ERR(range)) { 1059 + err = PTR_ERR(range); 1060 + goto err_notifier_remove; 1061 + } 1062 + 1063 + drm_gpusvm_range_insert(notifier, range); 1064 + if (notifier_alloc) 1065 + drm_gpusvm_notifier_insert(gpusvm, notifier); 1066 + 1067 + out_mmunlock: 1068 + mmap_read_unlock(mm); 1069 + mmput(mm); 1070 + 1071 + return range; 1072 + 1073 + err_notifier_remove: 1074 + mmap_read_unlock(mm); 1075 + if (notifier_alloc) 1076 + mmu_interval_notifier_remove(&notifier->notifier); 1077 + err_notifier: 1078 + if (notifier_alloc) 1079 + drm_gpusvm_notifier_free(gpusvm, notifier); 1080 + err_mmunlock: 1081 + mmput(mm); 1082 + return ERR_PTR(err); 1083 + } 1084 + EXPORT_SYMBOL_GPL(drm_gpusvm_range_find_or_insert); 1085 + 1086 + /** 1087 + * __drm_gpusvm_range_unmap_pages() - Unmap pages associated with a GPU SVM range (internal) 1088 + * @gpusvm: Pointer to the GPU SVM structure 1089 + * @range: Pointer to the GPU SVM range structure 1090 + * @npages: Number of pages to unmap 1091 + * 1092 + * This function unmap pages associated with a GPU SVM range. Assumes and 1093 + * asserts correct locking is in place when called. 1094 + */ 1095 + static void __drm_gpusvm_range_unmap_pages(struct drm_gpusvm *gpusvm, 1096 + struct drm_gpusvm_range *range, 1097 + unsigned long npages) 1098 + { 1099 + unsigned long i, j; 1100 + struct drm_pagemap *dpagemap = range->dpagemap; 1101 + struct device *dev = gpusvm->drm->dev; 1102 + 1103 + lockdep_assert_held(&gpusvm->notifier_lock); 1104 + 1105 + if (range->flags.has_dma_mapping) { 1106 + for (i = 0, j = 0; i < npages; j++) { 1107 + struct drm_pagemap_device_addr *addr = &range->dma_addr[j]; 1108 + 1109 + if (addr->proto == DRM_INTERCONNECT_SYSTEM) 1110 + dma_unmap_page(dev, 1111 + addr->addr, 1112 + PAGE_SIZE << addr->order, 1113 + addr->dir); 1114 + else if (dpagemap && dpagemap->ops->device_unmap) 1115 + dpagemap->ops->device_unmap(dpagemap, 1116 + dev, *addr); 1117 + i += 1 << addr->order; 1118 + } 1119 + range->flags.has_devmem_pages = false; 1120 + range->flags.has_dma_mapping = false; 1121 + range->dpagemap = NULL; 1122 + } 1123 + } 1124 + 1125 + /** 1126 + * drm_gpusvm_range_free_pages() - Free pages associated with a GPU SVM range 1127 + * @gpusvm: Pointer to the GPU SVM structure 1128 + * @range: Pointer to the GPU SVM range structure 1129 + * 1130 + * This function frees the dma address array associated with a GPU SVM range. 1131 + */ 1132 + static void drm_gpusvm_range_free_pages(struct drm_gpusvm *gpusvm, 1133 + struct drm_gpusvm_range *range) 1134 + { 1135 + lockdep_assert_held(&gpusvm->notifier_lock); 1136 + 1137 + if (range->dma_addr) { 1138 + kvfree(range->dma_addr); 1139 + range->dma_addr = NULL; 1140 + } 1141 + } 1142 + 1143 + /** 1144 + * drm_gpusvm_range_remove() - Remove GPU SVM range 1145 + * @gpusvm: Pointer to the GPU SVM structure 1146 + * @range: Pointer to the GPU SVM range to be removed 1147 + * 1148 + * This function removes the specified GPU SVM range and also removes the parent 1149 + * GPU SVM notifier if no more ranges remain in the notifier. The caller must 1150 + * hold a lock to protect range and notifier removal. 1151 + */ 1152 + void drm_gpusvm_range_remove(struct drm_gpusvm *gpusvm, 1153 + struct drm_gpusvm_range *range) 1154 + { 1155 + unsigned long npages = npages_in_range(drm_gpusvm_range_start(range), 1156 + drm_gpusvm_range_end(range)); 1157 + struct drm_gpusvm_notifier *notifier; 1158 + 1159 + drm_gpusvm_driver_lock_held(gpusvm); 1160 + 1161 + notifier = drm_gpusvm_notifier_find(gpusvm, 1162 + drm_gpusvm_range_start(range)); 1163 + if (WARN_ON_ONCE(!notifier)) 1164 + return; 1165 + 1166 + drm_gpusvm_notifier_lock(gpusvm); 1167 + __drm_gpusvm_range_unmap_pages(gpusvm, range, npages); 1168 + drm_gpusvm_range_free_pages(gpusvm, range); 1169 + __drm_gpusvm_range_remove(notifier, range); 1170 + drm_gpusvm_notifier_unlock(gpusvm); 1171 + 1172 + drm_gpusvm_range_put(range); 1173 + 1174 + if (RB_EMPTY_ROOT(&notifier->root.rb_root)) { 1175 + if (!notifier->flags.removed) 1176 + mmu_interval_notifier_remove(&notifier->notifier); 1177 + drm_gpusvm_notifier_remove(gpusvm, notifier); 1178 + drm_gpusvm_notifier_free(gpusvm, notifier); 1179 + } 1180 + } 1181 + EXPORT_SYMBOL_GPL(drm_gpusvm_range_remove); 1182 + 1183 + /** 1184 + * drm_gpusvm_range_get() - Get a reference to GPU SVM range 1185 + * @range: Pointer to the GPU SVM range 1186 + * 1187 + * This function increments the reference count of the specified GPU SVM range. 1188 + * 1189 + * Return: Pointer to the GPU SVM range. 1190 + */ 1191 + struct drm_gpusvm_range * 1192 + drm_gpusvm_range_get(struct drm_gpusvm_range *range) 1193 + { 1194 + kref_get(&range->refcount); 1195 + 1196 + return range; 1197 + } 1198 + EXPORT_SYMBOL_GPL(drm_gpusvm_range_get); 1199 + 1200 + /** 1201 + * drm_gpusvm_range_destroy() - Destroy GPU SVM range 1202 + * @refcount: Pointer to the reference counter embedded in the GPU SVM range 1203 + * 1204 + * This function destroys the specified GPU SVM range when its reference count 1205 + * reaches zero. If a custom range-free function is provided, it is invoked to 1206 + * free the range; otherwise, the range is deallocated using kfree(). 1207 + */ 1208 + static void drm_gpusvm_range_destroy(struct kref *refcount) 1209 + { 1210 + struct drm_gpusvm_range *range = 1211 + container_of(refcount, struct drm_gpusvm_range, refcount); 1212 + struct drm_gpusvm *gpusvm = range->gpusvm; 1213 + 1214 + if (gpusvm->ops->range_free) 1215 + gpusvm->ops->range_free(range); 1216 + else 1217 + kfree(range); 1218 + } 1219 + 1220 + /** 1221 + * drm_gpusvm_range_put() - Put a reference to GPU SVM range 1222 + * @range: Pointer to the GPU SVM range 1223 + * 1224 + * This function decrements the reference count of the specified GPU SVM range 1225 + * and frees it when the count reaches zero. 1226 + */ 1227 + void drm_gpusvm_range_put(struct drm_gpusvm_range *range) 1228 + { 1229 + kref_put(&range->refcount, drm_gpusvm_range_destroy); 1230 + } 1231 + EXPORT_SYMBOL_GPL(drm_gpusvm_range_put); 1232 + 1233 + /** 1234 + * drm_gpusvm_range_pages_valid() - GPU SVM range pages valid 1235 + * @gpusvm: Pointer to the GPU SVM structure 1236 + * @range: Pointer to the GPU SVM range structure 1237 + * 1238 + * This function determines if a GPU SVM range pages are valid. Expected be 1239 + * called holding gpusvm->notifier_lock and as the last step before committing a 1240 + * GPU binding. This is akin to a notifier seqno check in the HMM documentation 1241 + * but due to wider notifiers (i.e., notifiers which span multiple ranges) this 1242 + * function is required for finer grained checking (i.e., per range) if pages 1243 + * are valid. 1244 + * 1245 + * Return: True if GPU SVM range has valid pages, False otherwise 1246 + */ 1247 + bool drm_gpusvm_range_pages_valid(struct drm_gpusvm *gpusvm, 1248 + struct drm_gpusvm_range *range) 1249 + { 1250 + lockdep_assert_held(&gpusvm->notifier_lock); 1251 + 1252 + return range->flags.has_devmem_pages || range->flags.has_dma_mapping; 1253 + } 1254 + EXPORT_SYMBOL_GPL(drm_gpusvm_range_pages_valid); 1255 + 1256 + /** 1257 + * drm_gpusvm_range_pages_valid_unlocked() - GPU SVM range pages valid unlocked 1258 + * @gpusvm: Pointer to the GPU SVM structure 1259 + * @range: Pointer to the GPU SVM range structure 1260 + * 1261 + * This function determines if a GPU SVM range pages are valid. Expected be 1262 + * called without holding gpusvm->notifier_lock. 1263 + * 1264 + * Return: True if GPU SVM range has valid pages, False otherwise 1265 + */ 1266 + static bool 1267 + drm_gpusvm_range_pages_valid_unlocked(struct drm_gpusvm *gpusvm, 1268 + struct drm_gpusvm_range *range) 1269 + { 1270 + bool pages_valid; 1271 + 1272 + if (!range->dma_addr) 1273 + return false; 1274 + 1275 + drm_gpusvm_notifier_lock(gpusvm); 1276 + pages_valid = drm_gpusvm_range_pages_valid(gpusvm, range); 1277 + if (!pages_valid) 1278 + drm_gpusvm_range_free_pages(gpusvm, range); 1279 + drm_gpusvm_notifier_unlock(gpusvm); 1280 + 1281 + return pages_valid; 1282 + } 1283 + 1284 + /** 1285 + * drm_gpusvm_range_get_pages() - Get pages for a GPU SVM range 1286 + * @gpusvm: Pointer to the GPU SVM structure 1287 + * @range: Pointer to the GPU SVM range structure 1288 + * @ctx: GPU SVM context 1289 + * 1290 + * This function gets pages for a GPU SVM range and ensures they are mapped for 1291 + * DMA access. 1292 + * 1293 + * Return: 0 on success, negative error code on failure. 1294 + */ 1295 + int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm, 1296 + struct drm_gpusvm_range *range, 1297 + const struct drm_gpusvm_ctx *ctx) 1298 + { 1299 + struct mmu_interval_notifier *notifier = &range->notifier->notifier; 1300 + struct hmm_range hmm_range = { 1301 + .default_flags = HMM_PFN_REQ_FAULT | (ctx->read_only ? 0 : 1302 + HMM_PFN_REQ_WRITE), 1303 + .notifier = notifier, 1304 + .start = drm_gpusvm_range_start(range), 1305 + .end = drm_gpusvm_range_end(range), 1306 + .dev_private_owner = gpusvm->device_private_page_owner, 1307 + }; 1308 + struct mm_struct *mm = gpusvm->mm; 1309 + struct drm_gpusvm_zdd *zdd; 1310 + unsigned long timeout = 1311 + jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); 1312 + unsigned long i, j; 1313 + unsigned long npages = npages_in_range(drm_gpusvm_range_start(range), 1314 + drm_gpusvm_range_end(range)); 1315 + unsigned long num_dma_mapped; 1316 + unsigned int order = 0; 1317 + unsigned long *pfns; 1318 + struct page **pages; 1319 + int err = 0; 1320 + struct dev_pagemap *pagemap; 1321 + struct drm_pagemap *dpagemap; 1322 + 1323 + retry: 1324 + hmm_range.notifier_seq = mmu_interval_read_begin(notifier); 1325 + if (drm_gpusvm_range_pages_valid_unlocked(gpusvm, range)) 1326 + goto set_seqno; 1327 + 1328 + pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL); 1329 + if (!pfns) 1330 + return -ENOMEM; 1331 + 1332 + if (!mmget_not_zero(mm)) { 1333 + err = -EFAULT; 1334 + goto err_free; 1335 + } 1336 + 1337 + hmm_range.hmm_pfns = pfns; 1338 + while (true) { 1339 + mmap_read_lock(mm); 1340 + err = hmm_range_fault(&hmm_range); 1341 + mmap_read_unlock(mm); 1342 + 1343 + if (err == -EBUSY) { 1344 + if (time_after(jiffies, timeout)) 1345 + break; 1346 + 1347 + hmm_range.notifier_seq = 1348 + mmu_interval_read_begin(notifier); 1349 + continue; 1350 + } 1351 + break; 1352 + } 1353 + mmput(mm); 1354 + if (err) 1355 + goto err_free; 1356 + 1357 + pages = (struct page **)pfns; 1358 + map_pages: 1359 + /* 1360 + * Perform all dma mappings under the notifier lock to not 1361 + * access freed pages. A notifier will either block on 1362 + * the notifier lock or unmap dma. 1363 + */ 1364 + drm_gpusvm_notifier_lock(gpusvm); 1365 + 1366 + if (range->flags.unmapped) { 1367 + drm_gpusvm_notifier_unlock(gpusvm); 1368 + err = -EFAULT; 1369 + goto err_free; 1370 + } 1371 + 1372 + if (mmu_interval_read_retry(notifier, hmm_range.notifier_seq)) { 1373 + drm_gpusvm_notifier_unlock(gpusvm); 1374 + kvfree(pfns); 1375 + goto retry; 1376 + } 1377 + 1378 + if (!range->dma_addr) { 1379 + /* Unlock and restart mapping to allocate memory. */ 1380 + drm_gpusvm_notifier_unlock(gpusvm); 1381 + range->dma_addr = kvmalloc_array(npages, 1382 + sizeof(*range->dma_addr), 1383 + GFP_KERNEL); 1384 + if (!range->dma_addr) { 1385 + err = -ENOMEM; 1386 + goto err_free; 1387 + } 1388 + goto map_pages; 1389 + } 1390 + 1391 + zdd = NULL; 1392 + num_dma_mapped = 0; 1393 + for (i = 0, j = 0; i < npages; ++j) { 1394 + struct page *page = hmm_pfn_to_page(pfns[i]); 1395 + 1396 + order = hmm_pfn_to_map_order(pfns[i]); 1397 + if (is_device_private_page(page) || 1398 + is_device_coherent_page(page)) { 1399 + if (zdd != page->zone_device_data && i > 0) { 1400 + err = -EOPNOTSUPP; 1401 + goto err_unmap; 1402 + } 1403 + zdd = page->zone_device_data; 1404 + if (pagemap != page->pgmap) { 1405 + if (i > 0) { 1406 + err = -EOPNOTSUPP; 1407 + goto err_unmap; 1408 + } 1409 + 1410 + pagemap = page->pgmap; 1411 + dpagemap = zdd->devmem_allocation->dpagemap; 1412 + if (drm_WARN_ON(gpusvm->drm, !dpagemap)) { 1413 + /* 1414 + * Raced. This is not supposed to happen 1415 + * since hmm_range_fault() should've migrated 1416 + * this page to system. 1417 + */ 1418 + err = -EAGAIN; 1419 + goto err_unmap; 1420 + } 1421 + } 1422 + range->dma_addr[j] = 1423 + dpagemap->ops->device_map(dpagemap, 1424 + gpusvm->drm->dev, 1425 + page, order, 1426 + DMA_BIDIRECTIONAL); 1427 + if (dma_mapping_error(gpusvm->drm->dev, 1428 + range->dma_addr[j].addr)) { 1429 + err = -EFAULT; 1430 + goto err_unmap; 1431 + } 1432 + 1433 + pages[i] = page; 1434 + } else { 1435 + dma_addr_t addr; 1436 + 1437 + if (is_zone_device_page(page) || zdd) { 1438 + err = -EOPNOTSUPP; 1439 + goto err_unmap; 1440 + } 1441 + 1442 + addr = dma_map_page(gpusvm->drm->dev, 1443 + page, 0, 1444 + PAGE_SIZE << order, 1445 + DMA_BIDIRECTIONAL); 1446 + if (dma_mapping_error(gpusvm->drm->dev, addr)) { 1447 + err = -EFAULT; 1448 + goto err_unmap; 1449 + } 1450 + 1451 + range->dma_addr[j] = drm_pagemap_device_addr_encode 1452 + (addr, DRM_INTERCONNECT_SYSTEM, order, 1453 + DMA_BIDIRECTIONAL); 1454 + } 1455 + i += 1 << order; 1456 + num_dma_mapped = i; 1457 + } 1458 + 1459 + range->flags.has_dma_mapping = true; 1460 + if (zdd) { 1461 + range->flags.has_devmem_pages = true; 1462 + range->dpagemap = dpagemap; 1463 + } 1464 + 1465 + drm_gpusvm_notifier_unlock(gpusvm); 1466 + kvfree(pfns); 1467 + set_seqno: 1468 + range->notifier_seq = hmm_range.notifier_seq; 1469 + 1470 + return 0; 1471 + 1472 + err_unmap: 1473 + __drm_gpusvm_range_unmap_pages(gpusvm, range, num_dma_mapped); 1474 + drm_gpusvm_notifier_unlock(gpusvm); 1475 + err_free: 1476 + kvfree(pfns); 1477 + if (err == -EAGAIN) 1478 + goto retry; 1479 + return err; 1480 + } 1481 + EXPORT_SYMBOL_GPL(drm_gpusvm_range_get_pages); 1482 + 1483 + /** 1484 + * drm_gpusvm_range_unmap_pages() - Unmap pages associated with a GPU SVM range 1485 + * @gpusvm: Pointer to the GPU SVM structure 1486 + * @range: Pointer to the GPU SVM range structure 1487 + * @ctx: GPU SVM context 1488 + * 1489 + * This function unmaps pages associated with a GPU SVM range. If @in_notifier 1490 + * is set, it is assumed that gpusvm->notifier_lock is held in write mode; if it 1491 + * is clear, it acquires gpusvm->notifier_lock in read mode. Must be called on 1492 + * each GPU SVM range attached to notifier in gpusvm->ops->invalidate for IOMMU 1493 + * security model. 1494 + */ 1495 + void drm_gpusvm_range_unmap_pages(struct drm_gpusvm *gpusvm, 1496 + struct drm_gpusvm_range *range, 1497 + const struct drm_gpusvm_ctx *ctx) 1498 + { 1499 + unsigned long npages = npages_in_range(drm_gpusvm_range_start(range), 1500 + drm_gpusvm_range_end(range)); 1501 + 1502 + if (ctx->in_notifier) 1503 + lockdep_assert_held_write(&gpusvm->notifier_lock); 1504 + else 1505 + drm_gpusvm_notifier_lock(gpusvm); 1506 + 1507 + __drm_gpusvm_range_unmap_pages(gpusvm, range, npages); 1508 + 1509 + if (!ctx->in_notifier) 1510 + drm_gpusvm_notifier_unlock(gpusvm); 1511 + } 1512 + EXPORT_SYMBOL_GPL(drm_gpusvm_range_unmap_pages); 1513 + 1514 + /** 1515 + * drm_gpusvm_migration_unlock_put_page() - Put a migration page 1516 + * @page: Pointer to the page to put 1517 + * 1518 + * This function unlocks and puts a page. 1519 + */ 1520 + static void drm_gpusvm_migration_unlock_put_page(struct page *page) 1521 + { 1522 + unlock_page(page); 1523 + put_page(page); 1524 + } 1525 + 1526 + /** 1527 + * drm_gpusvm_migration_unlock_put_pages() - Put migration pages 1528 + * @npages: Number of pages 1529 + * @migrate_pfn: Array of migrate page frame numbers 1530 + * 1531 + * This function unlocks and puts an array of pages. 1532 + */ 1533 + static void drm_gpusvm_migration_unlock_put_pages(unsigned long npages, 1534 + unsigned long *migrate_pfn) 1535 + { 1536 + unsigned long i; 1537 + 1538 + for (i = 0; i < npages; ++i) { 1539 + struct page *page; 1540 + 1541 + if (!migrate_pfn[i]) 1542 + continue; 1543 + 1544 + page = migrate_pfn_to_page(migrate_pfn[i]); 1545 + drm_gpusvm_migration_unlock_put_page(page); 1546 + migrate_pfn[i] = 0; 1547 + } 1548 + } 1549 + 1550 + /** 1551 + * drm_gpusvm_get_devmem_page() - Get a reference to a device memory page 1552 + * @page: Pointer to the page 1553 + * @zdd: Pointer to the GPU SVM zone device data 1554 + * 1555 + * This function associates the given page with the specified GPU SVM zone 1556 + * device data and initializes it for zone device usage. 1557 + */ 1558 + static void drm_gpusvm_get_devmem_page(struct page *page, 1559 + struct drm_gpusvm_zdd *zdd) 1560 + { 1561 + page->zone_device_data = drm_gpusvm_zdd_get(zdd); 1562 + zone_device_page_init(page); 1563 + } 1564 + 1565 + /** 1566 + * drm_gpusvm_migrate_map_pages() - Map migration pages for GPU SVM migration 1567 + * @dev: The device for which the pages are being mapped 1568 + * @dma_addr: Array to store DMA addresses corresponding to mapped pages 1569 + * @migrate_pfn: Array of migrate page frame numbers to map 1570 + * @npages: Number of pages to map 1571 + * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL) 1572 + * 1573 + * This function maps pages of memory for migration usage in GPU SVM. It 1574 + * iterates over each page frame number provided in @migrate_pfn, maps the 1575 + * corresponding page, and stores the DMA address in the provided @dma_addr 1576 + * array. 1577 + * 1578 + * Return: 0 on success, -EFAULT if an error occurs during mapping. 1579 + */ 1580 + static int drm_gpusvm_migrate_map_pages(struct device *dev, 1581 + dma_addr_t *dma_addr, 1582 + unsigned long *migrate_pfn, 1583 + unsigned long npages, 1584 + enum dma_data_direction dir) 1585 + { 1586 + unsigned long i; 1587 + 1588 + for (i = 0; i < npages; ++i) { 1589 + struct page *page = migrate_pfn_to_page(migrate_pfn[i]); 1590 + 1591 + if (!page) 1592 + continue; 1593 + 1594 + if (WARN_ON_ONCE(is_zone_device_page(page))) 1595 + return -EFAULT; 1596 + 1597 + dma_addr[i] = dma_map_page(dev, page, 0, PAGE_SIZE, dir); 1598 + if (dma_mapping_error(dev, dma_addr[i])) 1599 + return -EFAULT; 1600 + } 1601 + 1602 + return 0; 1603 + } 1604 + 1605 + /** 1606 + * drm_gpusvm_migrate_unmap_pages() - Unmap pages previously mapped for GPU SVM migration 1607 + * @dev: The device for which the pages were mapped 1608 + * @dma_addr: Array of DMA addresses corresponding to mapped pages 1609 + * @npages: Number of pages to unmap 1610 + * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL) 1611 + * 1612 + * This function unmaps previously mapped pages of memory for GPU Shared Virtual 1613 + * Memory (SVM). It iterates over each DMA address provided in @dma_addr, checks 1614 + * if it's valid and not already unmapped, and unmaps the corresponding page. 1615 + */ 1616 + static void drm_gpusvm_migrate_unmap_pages(struct device *dev, 1617 + dma_addr_t *dma_addr, 1618 + unsigned long npages, 1619 + enum dma_data_direction dir) 1620 + { 1621 + unsigned long i; 1622 + 1623 + for (i = 0; i < npages; ++i) { 1624 + if (!dma_addr[i] || dma_mapping_error(dev, dma_addr[i])) 1625 + continue; 1626 + 1627 + dma_unmap_page(dev, dma_addr[i], PAGE_SIZE, dir); 1628 + } 1629 + } 1630 + 1631 + /** 1632 + * drm_gpusvm_migrate_to_devmem() - Migrate GPU SVM range to device memory 1633 + * @gpusvm: Pointer to the GPU SVM structure 1634 + * @range: Pointer to the GPU SVM range structure 1635 + * @devmem_allocation: Pointer to the device memory allocation. The caller 1636 + * should hold a reference to the device memory allocation, 1637 + * which should be dropped via ops->devmem_release or upon 1638 + * the failure of this function. 1639 + * @ctx: GPU SVM context 1640 + * 1641 + * This function migrates the specified GPU SVM range to device memory. It 1642 + * performs the necessary setup and invokes the driver-specific operations for 1643 + * migration to device memory. Upon successful return, @devmem_allocation can 1644 + * safely reference @range until ops->devmem_release is called which only upon 1645 + * successful return. Expected to be called while holding the mmap lock in read 1646 + * mode. 1647 + * 1648 + * Return: 0 on success, negative error code on failure. 1649 + */ 1650 + int drm_gpusvm_migrate_to_devmem(struct drm_gpusvm *gpusvm, 1651 + struct drm_gpusvm_range *range, 1652 + struct drm_gpusvm_devmem *devmem_allocation, 1653 + const struct drm_gpusvm_ctx *ctx) 1654 + { 1655 + const struct drm_gpusvm_devmem_ops *ops = devmem_allocation->ops; 1656 + unsigned long start = drm_gpusvm_range_start(range), 1657 + end = drm_gpusvm_range_end(range); 1658 + struct migrate_vma migrate = { 1659 + .start = start, 1660 + .end = end, 1661 + .pgmap_owner = gpusvm->device_private_page_owner, 1662 + .flags = MIGRATE_VMA_SELECT_SYSTEM, 1663 + }; 1664 + struct mm_struct *mm = gpusvm->mm; 1665 + unsigned long i, npages = npages_in_range(start, end); 1666 + struct vm_area_struct *vas; 1667 + struct drm_gpusvm_zdd *zdd = NULL; 1668 + struct page **pages; 1669 + dma_addr_t *dma_addr; 1670 + void *buf; 1671 + int err; 1672 + 1673 + mmap_assert_locked(gpusvm->mm); 1674 + 1675 + if (!range->flags.migrate_devmem) 1676 + return -EINVAL; 1677 + 1678 + if (!ops->populate_devmem_pfn || !ops->copy_to_devmem || 1679 + !ops->copy_to_ram) 1680 + return -EOPNOTSUPP; 1681 + 1682 + vas = vma_lookup(mm, start); 1683 + if (!vas) { 1684 + err = -ENOENT; 1685 + goto err_out; 1686 + } 1687 + 1688 + if (end > vas->vm_end || start < vas->vm_start) { 1689 + err = -EINVAL; 1690 + goto err_out; 1691 + } 1692 + 1693 + if (!vma_is_anonymous(vas)) { 1694 + err = -EBUSY; 1695 + goto err_out; 1696 + } 1697 + 1698 + buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) + 1699 + sizeof(*pages), GFP_KERNEL); 1700 + if (!buf) { 1701 + err = -ENOMEM; 1702 + goto err_out; 1703 + } 1704 + dma_addr = buf + (2 * sizeof(*migrate.src) * npages); 1705 + pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages; 1706 + 1707 + zdd = drm_gpusvm_zdd_alloc(gpusvm->device_private_page_owner); 1708 + if (!zdd) { 1709 + err = -ENOMEM; 1710 + goto err_free; 1711 + } 1712 + 1713 + migrate.vma = vas; 1714 + migrate.src = buf; 1715 + migrate.dst = migrate.src + npages; 1716 + 1717 + err = migrate_vma_setup(&migrate); 1718 + if (err) 1719 + goto err_free; 1720 + 1721 + if (!migrate.cpages) { 1722 + err = -EFAULT; 1723 + goto err_free; 1724 + } 1725 + 1726 + if (migrate.cpages != npages) { 1727 + err = -EBUSY; 1728 + goto err_finalize; 1729 + } 1730 + 1731 + err = ops->populate_devmem_pfn(devmem_allocation, npages, migrate.dst); 1732 + if (err) 1733 + goto err_finalize; 1734 + 1735 + err = drm_gpusvm_migrate_map_pages(devmem_allocation->dev, dma_addr, 1736 + migrate.src, npages, DMA_TO_DEVICE); 1737 + if (err) 1738 + goto err_finalize; 1739 + 1740 + for (i = 0; i < npages; ++i) { 1741 + struct page *page = pfn_to_page(migrate.dst[i]); 1742 + 1743 + pages[i] = page; 1744 + migrate.dst[i] = migrate_pfn(migrate.dst[i]); 1745 + drm_gpusvm_get_devmem_page(page, zdd); 1746 + } 1747 + 1748 + err = ops->copy_to_devmem(pages, dma_addr, npages); 1749 + if (err) 1750 + goto err_finalize; 1751 + 1752 + /* Upon success bind devmem allocation to range and zdd */ 1753 + zdd->devmem_allocation = devmem_allocation; /* Owns ref */ 1754 + 1755 + err_finalize: 1756 + if (err) 1757 + drm_gpusvm_migration_unlock_put_pages(npages, migrate.dst); 1758 + migrate_vma_pages(&migrate); 1759 + migrate_vma_finalize(&migrate); 1760 + drm_gpusvm_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages, 1761 + DMA_TO_DEVICE); 1762 + err_free: 1763 + if (zdd) 1764 + drm_gpusvm_zdd_put(zdd); 1765 + kvfree(buf); 1766 + err_out: 1767 + return err; 1768 + } 1769 + EXPORT_SYMBOL_GPL(drm_gpusvm_migrate_to_devmem); 1770 + 1771 + /** 1772 + * drm_gpusvm_migrate_populate_ram_pfn() - Populate RAM PFNs for a VM area 1773 + * @vas: Pointer to the VM area structure, can be NULL 1774 + * @fault_page: Fault page 1775 + * @npages: Number of pages to populate 1776 + * @mpages: Number of pages to migrate 1777 + * @src_mpfn: Source array of migrate PFNs 1778 + * @mpfn: Array of migrate PFNs to populate 1779 + * @addr: Start address for PFN allocation 1780 + * 1781 + * This function populates the RAM migrate page frame numbers (PFNs) for the 1782 + * specified VM area structure. It allocates and locks pages in the VM area for 1783 + * RAM usage. If vas is non-NULL use alloc_page_vma for allocation, if NULL use 1784 + * alloc_page for allocation. 1785 + * 1786 + * Return: 0 on success, negative error code on failure. 1787 + */ 1788 + static int drm_gpusvm_migrate_populate_ram_pfn(struct vm_area_struct *vas, 1789 + struct page *fault_page, 1790 + unsigned long npages, 1791 + unsigned long *mpages, 1792 + unsigned long *src_mpfn, 1793 + unsigned long *mpfn, 1794 + unsigned long addr) 1795 + { 1796 + unsigned long i; 1797 + 1798 + for (i = 0; i < npages; ++i, addr += PAGE_SIZE) { 1799 + struct page *page, *src_page; 1800 + 1801 + if (!(src_mpfn[i] & MIGRATE_PFN_MIGRATE)) 1802 + continue; 1803 + 1804 + src_page = migrate_pfn_to_page(src_mpfn[i]); 1805 + if (!src_page) 1806 + continue; 1807 + 1808 + if (fault_page) { 1809 + if (src_page->zone_device_data != 1810 + fault_page->zone_device_data) 1811 + continue; 1812 + } 1813 + 1814 + if (vas) 1815 + page = alloc_page_vma(GFP_HIGHUSER, vas, addr); 1816 + else 1817 + page = alloc_page(GFP_HIGHUSER); 1818 + 1819 + if (!page) 1820 + goto free_pages; 1821 + 1822 + mpfn[i] = migrate_pfn(page_to_pfn(page)); 1823 + } 1824 + 1825 + for (i = 0; i < npages; ++i) { 1826 + struct page *page = migrate_pfn_to_page(mpfn[i]); 1827 + 1828 + if (!page) 1829 + continue; 1830 + 1831 + WARN_ON_ONCE(!trylock_page(page)); 1832 + ++*mpages; 1833 + } 1834 + 1835 + return 0; 1836 + 1837 + free_pages: 1838 + for (i = 0; i < npages; ++i) { 1839 + struct page *page = migrate_pfn_to_page(mpfn[i]); 1840 + 1841 + if (!page) 1842 + continue; 1843 + 1844 + put_page(page); 1845 + mpfn[i] = 0; 1846 + } 1847 + return -ENOMEM; 1848 + } 1849 + 1850 + /** 1851 + * drm_gpusvm_evict_to_ram() - Evict GPU SVM range to RAM 1852 + * @devmem_allocation: Pointer to the device memory allocation 1853 + * 1854 + * Similar to __drm_gpusvm_migrate_to_ram but does not require mmap lock and 1855 + * migration done via migrate_device_* functions. 1856 + * 1857 + * Return: 0 on success, negative error code on failure. 1858 + */ 1859 + int drm_gpusvm_evict_to_ram(struct drm_gpusvm_devmem *devmem_allocation) 1860 + { 1861 + const struct drm_gpusvm_devmem_ops *ops = devmem_allocation->ops; 1862 + unsigned long npages, mpages = 0; 1863 + struct page **pages; 1864 + unsigned long *src, *dst; 1865 + dma_addr_t *dma_addr; 1866 + void *buf; 1867 + int i, err = 0; 1868 + unsigned int retry_count = 2; 1869 + 1870 + npages = devmem_allocation->size >> PAGE_SHIFT; 1871 + 1872 + retry: 1873 + if (!mmget_not_zero(devmem_allocation->mm)) 1874 + return -EFAULT; 1875 + 1876 + buf = kvcalloc(npages, 2 * sizeof(*src) + sizeof(*dma_addr) + 1877 + sizeof(*pages), GFP_KERNEL); 1878 + if (!buf) { 1879 + err = -ENOMEM; 1880 + goto err_out; 1881 + } 1882 + src = buf; 1883 + dst = buf + (sizeof(*src) * npages); 1884 + dma_addr = buf + (2 * sizeof(*src) * npages); 1885 + pages = buf + (2 * sizeof(*src) + sizeof(*dma_addr)) * npages; 1886 + 1887 + err = ops->populate_devmem_pfn(devmem_allocation, npages, src); 1888 + if (err) 1889 + goto err_free; 1890 + 1891 + err = migrate_device_pfns(src, npages); 1892 + if (err) 1893 + goto err_free; 1894 + 1895 + err = drm_gpusvm_migrate_populate_ram_pfn(NULL, NULL, npages, &mpages, 1896 + src, dst, 0); 1897 + if (err || !mpages) 1898 + goto err_finalize; 1899 + 1900 + err = drm_gpusvm_migrate_map_pages(devmem_allocation->dev, dma_addr, 1901 + dst, npages, DMA_FROM_DEVICE); 1902 + if (err) 1903 + goto err_finalize; 1904 + 1905 + for (i = 0; i < npages; ++i) 1906 + pages[i] = migrate_pfn_to_page(src[i]); 1907 + 1908 + err = ops->copy_to_ram(pages, dma_addr, npages); 1909 + if (err) 1910 + goto err_finalize; 1911 + 1912 + err_finalize: 1913 + if (err) 1914 + drm_gpusvm_migration_unlock_put_pages(npages, dst); 1915 + migrate_device_pages(src, dst, npages); 1916 + migrate_device_finalize(src, dst, npages); 1917 + drm_gpusvm_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages, 1918 + DMA_FROM_DEVICE); 1919 + err_free: 1920 + kvfree(buf); 1921 + err_out: 1922 + mmput_async(devmem_allocation->mm); 1923 + 1924 + if (completion_done(&devmem_allocation->detached)) 1925 + return 0; 1926 + 1927 + if (retry_count--) { 1928 + cond_resched(); 1929 + goto retry; 1930 + } 1931 + 1932 + return err ?: -EBUSY; 1933 + } 1934 + EXPORT_SYMBOL_GPL(drm_gpusvm_evict_to_ram); 1935 + 1936 + /** 1937 + * __drm_gpusvm_migrate_to_ram() - Migrate GPU SVM range to RAM (internal) 1938 + * @vas: Pointer to the VM area structure 1939 + * @device_private_page_owner: Device private pages owner 1940 + * @page: Pointer to the page for fault handling (can be NULL) 1941 + * @fault_addr: Fault address 1942 + * @size: Size of migration 1943 + * 1944 + * This internal function performs the migration of the specified GPU SVM range 1945 + * to RAM. It sets up the migration, populates + dma maps RAM PFNs, and 1946 + * invokes the driver-specific operations for migration to RAM. 1947 + * 1948 + * Return: 0 on success, negative error code on failure. 1949 + */ 1950 + static int __drm_gpusvm_migrate_to_ram(struct vm_area_struct *vas, 1951 + void *device_private_page_owner, 1952 + struct page *page, 1953 + unsigned long fault_addr, 1954 + unsigned long size) 1955 + { 1956 + struct migrate_vma migrate = { 1957 + .vma = vas, 1958 + .pgmap_owner = device_private_page_owner, 1959 + .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE | 1960 + MIGRATE_VMA_SELECT_DEVICE_COHERENT, 1961 + .fault_page = page, 1962 + }; 1963 + struct drm_gpusvm_zdd *zdd; 1964 + const struct drm_gpusvm_devmem_ops *ops; 1965 + struct device *dev = NULL; 1966 + unsigned long npages, mpages = 0; 1967 + struct page **pages; 1968 + dma_addr_t *dma_addr; 1969 + unsigned long start, end; 1970 + void *buf; 1971 + int i, err = 0; 1972 + 1973 + start = ALIGN_DOWN(fault_addr, size); 1974 + end = ALIGN(fault_addr + 1, size); 1975 + 1976 + /* Corner where VMA area struct has been partially unmapped */ 1977 + if (start < vas->vm_start) 1978 + start = vas->vm_start; 1979 + if (end > vas->vm_end) 1980 + end = vas->vm_end; 1981 + 1982 + migrate.start = start; 1983 + migrate.end = end; 1984 + npages = npages_in_range(start, end); 1985 + 1986 + buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) + 1987 + sizeof(*pages), GFP_KERNEL); 1988 + if (!buf) { 1989 + err = -ENOMEM; 1990 + goto err_out; 1991 + } 1992 + dma_addr = buf + (2 * sizeof(*migrate.src) * npages); 1993 + pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages; 1994 + 1995 + migrate.vma = vas; 1996 + migrate.src = buf; 1997 + migrate.dst = migrate.src + npages; 1998 + 1999 + err = migrate_vma_setup(&migrate); 2000 + if (err) 2001 + goto err_free; 2002 + 2003 + /* Raced with another CPU fault, nothing to do */ 2004 + if (!migrate.cpages) 2005 + goto err_free; 2006 + 2007 + if (!page) { 2008 + for (i = 0; i < npages; ++i) { 2009 + if (!(migrate.src[i] & MIGRATE_PFN_MIGRATE)) 2010 + continue; 2011 + 2012 + page = migrate_pfn_to_page(migrate.src[i]); 2013 + break; 2014 + } 2015 + 2016 + if (!page) 2017 + goto err_finalize; 2018 + } 2019 + zdd = page->zone_device_data; 2020 + ops = zdd->devmem_allocation->ops; 2021 + dev = zdd->devmem_allocation->dev; 2022 + 2023 + err = drm_gpusvm_migrate_populate_ram_pfn(vas, page, npages, &mpages, 2024 + migrate.src, migrate.dst, 2025 + start); 2026 + if (err) 2027 + goto err_finalize; 2028 + 2029 + err = drm_gpusvm_migrate_map_pages(dev, dma_addr, migrate.dst, npages, 2030 + DMA_FROM_DEVICE); 2031 + if (err) 2032 + goto err_finalize; 2033 + 2034 + for (i = 0; i < npages; ++i) 2035 + pages[i] = migrate_pfn_to_page(migrate.src[i]); 2036 + 2037 + err = ops->copy_to_ram(pages, dma_addr, npages); 2038 + if (err) 2039 + goto err_finalize; 2040 + 2041 + err_finalize: 2042 + if (err) 2043 + drm_gpusvm_migration_unlock_put_pages(npages, migrate.dst); 2044 + migrate_vma_pages(&migrate); 2045 + migrate_vma_finalize(&migrate); 2046 + if (dev) 2047 + drm_gpusvm_migrate_unmap_pages(dev, dma_addr, npages, 2048 + DMA_FROM_DEVICE); 2049 + err_free: 2050 + kvfree(buf); 2051 + err_out: 2052 + 2053 + return err; 2054 + } 2055 + 2056 + /** 2057 + * drm_gpusvm_range_evict - Evict GPU SVM range 2058 + * @pagemap: Pointer to the GPU SVM structure 2059 + * @range: Pointer to the GPU SVM range to be removed 2060 + * 2061 + * This function evicts the specified GPU SVM range. This function will not 2062 + * evict coherent pages. 2063 + * 2064 + * Return: 0 on success, a negative error code on failure. 2065 + */ 2066 + int drm_gpusvm_range_evict(struct drm_gpusvm *gpusvm, 2067 + struct drm_gpusvm_range *range) 2068 + { 2069 + struct mmu_interval_notifier *notifier = &range->notifier->notifier; 2070 + struct hmm_range hmm_range = { 2071 + .default_flags = HMM_PFN_REQ_FAULT, 2072 + .notifier = notifier, 2073 + .start = drm_gpusvm_range_start(range), 2074 + .end = drm_gpusvm_range_end(range), 2075 + .dev_private_owner = NULL, 2076 + }; 2077 + unsigned long timeout = 2078 + jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); 2079 + unsigned long *pfns; 2080 + unsigned long npages = npages_in_range(drm_gpusvm_range_start(range), 2081 + drm_gpusvm_range_end(range)); 2082 + int err = 0; 2083 + struct mm_struct *mm = gpusvm->mm; 2084 + 2085 + if (!mmget_not_zero(mm)) 2086 + return -EFAULT; 2087 + 2088 + pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL); 2089 + if (!pfns) 2090 + return -ENOMEM; 2091 + 2092 + hmm_range.hmm_pfns = pfns; 2093 + while (!time_after(jiffies, timeout)) { 2094 + hmm_range.notifier_seq = mmu_interval_read_begin(notifier); 2095 + if (time_after(jiffies, timeout)) { 2096 + err = -ETIME; 2097 + break; 2098 + } 2099 + 2100 + mmap_read_lock(mm); 2101 + err = hmm_range_fault(&hmm_range); 2102 + mmap_read_unlock(mm); 2103 + if (err != -EBUSY) 2104 + break; 2105 + } 2106 + 2107 + kvfree(pfns); 2108 + mmput(mm); 2109 + 2110 + return err; 2111 + } 2112 + EXPORT_SYMBOL_GPL(drm_gpusvm_range_evict); 2113 + 2114 + /** 2115 + * drm_gpusvm_page_free() - Put GPU SVM zone device data associated with a page 2116 + * @page: Pointer to the page 2117 + * 2118 + * This function is a callback used to put the GPU SVM zone device data 2119 + * associated with a page when it is being released. 2120 + */ 2121 + static void drm_gpusvm_page_free(struct page *page) 2122 + { 2123 + drm_gpusvm_zdd_put(page->zone_device_data); 2124 + } 2125 + 2126 + /** 2127 + * drm_gpusvm_migrate_to_ram() - Migrate GPU SVM range to RAM (page fault handler) 2128 + * @vmf: Pointer to the fault information structure 2129 + * 2130 + * This function is a page fault handler used to migrate a GPU SVM range to RAM. 2131 + * It retrieves the GPU SVM range information from the faulting page and invokes 2132 + * the internal migration function to migrate the range back to RAM. 2133 + * 2134 + * Return: VM_FAULT_SIGBUS on failure, 0 on success. 2135 + */ 2136 + static vm_fault_t drm_gpusvm_migrate_to_ram(struct vm_fault *vmf) 2137 + { 2138 + struct drm_gpusvm_zdd *zdd = vmf->page->zone_device_data; 2139 + int err; 2140 + 2141 + err = __drm_gpusvm_migrate_to_ram(vmf->vma, 2142 + zdd->device_private_page_owner, 2143 + vmf->page, vmf->address, 2144 + zdd->devmem_allocation->size); 2145 + 2146 + return err ? VM_FAULT_SIGBUS : 0; 2147 + } 2148 + 2149 + /** 2150 + * drm_gpusvm_pagemap_ops() - Device page map operations for GPU SVM 2151 + */ 2152 + static const struct dev_pagemap_ops drm_gpusvm_pagemap_ops = { 2153 + .page_free = drm_gpusvm_page_free, 2154 + .migrate_to_ram = drm_gpusvm_migrate_to_ram, 2155 + }; 2156 + 2157 + /** 2158 + * drm_gpusvm_pagemap_ops_get() - Retrieve GPU SVM device page map operations 2159 + * 2160 + * Return: Pointer to the GPU SVM device page map operations structure. 2161 + */ 2162 + const struct dev_pagemap_ops *drm_gpusvm_pagemap_ops_get(void) 2163 + { 2164 + return &drm_gpusvm_pagemap_ops; 2165 + } 2166 + EXPORT_SYMBOL_GPL(drm_gpusvm_pagemap_ops_get); 2167 + 2168 + /** 2169 + * drm_gpusvm_has_mapping() - Check if GPU SVM has mapping for the given address range 2170 + * @gpusvm: Pointer to the GPU SVM structure. 2171 + * @start: Start address 2172 + * @end: End address 2173 + * 2174 + * Return: True if GPU SVM has mapping, False otherwise 2175 + */ 2176 + bool drm_gpusvm_has_mapping(struct drm_gpusvm *gpusvm, unsigned long start, 2177 + unsigned long end) 2178 + { 2179 + struct drm_gpusvm_notifier *notifier; 2180 + 2181 + drm_gpusvm_for_each_notifier(notifier, gpusvm, start, end) { 2182 + struct drm_gpusvm_range *range = NULL; 2183 + 2184 + drm_gpusvm_for_each_range(range, notifier, start, end) 2185 + return true; 2186 + } 2187 + 2188 + return false; 2189 + } 2190 + EXPORT_SYMBOL_GPL(drm_gpusvm_has_mapping); 2191 + 2192 + /** 2193 + * drm_gpusvm_range_set_unmapped() - Mark a GPU SVM range as unmapped 2194 + * @range: Pointer to the GPU SVM range structure. 2195 + * @mmu_range: Pointer to the MMU notifier range structure. 2196 + * 2197 + * This function marks a GPU SVM range as unmapped and sets the partial_unmap flag 2198 + * if the range partially falls within the provided MMU notifier range. 2199 + */ 2200 + void drm_gpusvm_range_set_unmapped(struct drm_gpusvm_range *range, 2201 + const struct mmu_notifier_range *mmu_range) 2202 + { 2203 + lockdep_assert_held_write(&range->gpusvm->notifier_lock); 2204 + 2205 + range->flags.unmapped = true; 2206 + if (drm_gpusvm_range_start(range) < mmu_range->start || 2207 + drm_gpusvm_range_end(range) > mmu_range->end) 2208 + range->flags.partial_unmap = true; 2209 + } 2210 + EXPORT_SYMBOL_GPL(drm_gpusvm_range_set_unmapped); 2211 + 2212 + /** 2213 + * drm_gpusvm_devmem_init() - Initialize a GPU SVM device memory allocation 2214 + * 2215 + * @dev: Pointer to the device structure which device memory allocation belongs to 2216 + * @mm: Pointer to the mm_struct for the address space 2217 + * @ops: Pointer to the operations structure for GPU SVM device memory 2218 + * @dpagemap: The struct drm_pagemap we're allocating from. 2219 + * @size: Size of device memory allocation 2220 + */ 2221 + void drm_gpusvm_devmem_init(struct drm_gpusvm_devmem *devmem_allocation, 2222 + struct device *dev, struct mm_struct *mm, 2223 + const struct drm_gpusvm_devmem_ops *ops, 2224 + struct drm_pagemap *dpagemap, size_t size) 2225 + { 2226 + init_completion(&devmem_allocation->detached); 2227 + devmem_allocation->dev = dev; 2228 + devmem_allocation->mm = mm; 2229 + devmem_allocation->ops = ops; 2230 + devmem_allocation->dpagemap = dpagemap; 2231 + devmem_allocation->size = size; 2232 + } 2233 + EXPORT_SYMBOL_GPL(drm_gpusvm_devmem_init); 2234 + 2235 + MODULE_DESCRIPTION("DRM GPUSVM"); 2236 + MODULE_LICENSE("GPL");

+10

drivers/gpu/drm/xe/Kconfig

··· 39 39 select DRM_TTM_HELPER 40 40 select DRM_EXEC 41 41 select DRM_GPUVM 42 + select DRM_GPUSVM if !UML && DEVICE_PRIVATE 42 43 select DRM_SCHED 43 44 select MMU_NOTIFIER 44 45 select WANT_DEV_COREDUMP ··· 73 72 link BW, for instance on a Thunderbolt link. 74 73 75 74 If in doubt say "Y". 75 + 76 + config DRM_XE_DEVMEM_MIRROR 77 + bool "Enable device memory mirror" 78 + depends on DRM_XE 79 + select GET_FREE_REGION 80 + default y 81 + help 82 + Disable this option only if you want to compile out without device 83 + memory mirror. Will reduce KMD memory footprint when disabled. 76 84 77 85 config DRM_XE_FORCE_PROBE 78 86 string "Force probe xe for selected Intel hardware IDs"

+3

drivers/gpu/drm/xe/Makefile

··· 33 33 xe_device_sysfs.o \ 34 34 xe_dma_buf.o \ 35 35 xe_drm_client.o \ 36 + xe_eu_stall.o \ 36 37 xe_exec.o \ 37 38 xe_exec_queue.o \ 38 39 xe_execlist.o \ ··· 61 60 xe_guc_capture.o \ 62 61 xe_guc_ct.o \ 63 62 xe_guc_db_mgr.o \ 63 + xe_guc_engine_activity.o \ 64 64 xe_guc_hwconfig.o \ 65 65 xe_guc_id_mgr.o \ 66 66 xe_guc_klv_helpers.o \ ··· 125 123 xe_wopcm.o 126 124 127 125 xe-$(CONFIG_HMM_MIRROR) += xe_hmm.o 126 + xe-$(CONFIG_DRM_GPUSVM) += xe_svm.o 128 127 129 128 # graphics hardware monitoring (HWMON) support 130 129 xe-$(CONFIG_HWMON) += xe_hwmon.o

+1

drivers/gpu/drm/xe/abi/guc_actions_abi.h

··· 140 140 XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601, 141 141 XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507, 142 142 XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A, 143 + XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C, 143 144 XE_GUC_ACTION_NOTIFY_MEMORY_CAT_ERROR = 0x6000, 144 145 XE_GUC_ACTION_REPORT_PAGE_FAULT_REQ_DESC = 0x6002, 145 146 XE_GUC_ACTION_PAGE_FAULT_RES_DESC = 0x6003,

+3

drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h

··· 174 174 }; 175 175 } __packed; 176 176 177 + #define SLPC_CTX_FREQ_REQ_IS_COMPUTE REG_BIT(28) 178 + #define SLPC_OPTIMIZED_STRATEGY_COMPUTE REG_BIT(0) 179 + 177 180 struct slpc_shared_data_header { 178 181 /* Total size in bytes of this shared buffer. */ 179 182 u32 size;

+2 -11

drivers/gpu/drm/xe/display/xe_display.c

··· 170 170 intel_hpd_poll_fini(xe); 171 171 intel_hdcp_component_fini(display); 172 172 intel_audio_deinit(display); 173 + intel_display_driver_remove(display); 173 174 } 174 175 175 176 int xe_display_init(struct xe_device *xe) ··· 185 184 if (err) 186 185 return err; 187 186 188 - return xe_device_add_action_or_reset(xe, xe_display_fini, xe); 187 + return devm_add_action_or_reset(xe->drm.dev, xe_display_fini, xe); 189 188 } 190 189 191 190 void xe_display_register(struct xe_device *xe) ··· 208 207 209 208 intel_power_domains_disable(display); 210 209 intel_display_driver_unregister(display); 211 - } 212 - 213 - void xe_display_driver_remove(struct xe_device *xe) 214 - { 215 - struct intel_display *display = &xe->display; 216 - 217 - if (!xe->info.probe_display) 218 - return; 219 - 220 - intel_display_driver_remove(display); 221 210 } 222 211 223 212 /* IRQ-related functions */

-1

drivers/gpu/drm/xe/display/xe_display.h

··· 14 14 15 15 bool xe_display_driver_probe_defer(struct pci_dev *pdev); 16 16 void xe_display_driver_set_hooks(struct drm_driver *driver); 17 - void xe_display_driver_remove(struct xe_device *xe); 18 17 19 18 int xe_display_create(struct xe_device *xe); 20 19

-1

drivers/gpu/drm/xe/regs/xe_engine_regs.h

··· 53 53 54 54 #define RING_CTL(base) XE_REG((base) + 0x3c) 55 55 #define RING_CTL_SIZE(size) ((size) - PAGE_SIZE) /* in bytes -> pages */ 56 - #define RING_CTL_SIZE(size) ((size) - PAGE_SIZE) /* in bytes -> pages */ 57 56 58 57 #define RING_START_UDW(base) XE_REG((base) + 0x48) 59 58

+29

drivers/gpu/drm/xe/regs/xe_eu_stall_regs.h

··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2025 Intel Corporation 4 + */ 5 + 6 + #ifndef _XE_EU_STALL_REGS_H_ 7 + #define _XE_EU_STALL_REGS_H_ 8 + 9 + #include "regs/xe_reg_defs.h" 10 + 11 + #define XEHPC_EUSTALL_BASE XE_REG_MCR(0xe520) 12 + #define XEHPC_EUSTALL_BASE_BUF_ADDR REG_GENMASK(31, 6) 13 + #define XEHPC_EUSTALL_BASE_XECORE_BUF_SZ REG_GENMASK(5, 3) 14 + #define XEHPC_EUSTALL_BASE_ENABLE_SAMPLING REG_BIT(1) 15 + 16 + #define XEHPC_EUSTALL_BASE_UPPER XE_REG_MCR(0xe524) 17 + 18 + #define XEHPC_EUSTALL_REPORT XE_REG_MCR(0xe528, XE_REG_OPTION_MASKED) 19 + #define XEHPC_EUSTALL_REPORT_WRITE_PTR_MASK REG_GENMASK(15, 2) 20 + #define XEHPC_EUSTALL_REPORT_OVERFLOW_DROP REG_BIT(1) 21 + 22 + #define XEHPC_EUSTALL_REPORT1 XE_REG_MCR(0xe52c, XE_REG_OPTION_MASKED) 23 + #define XEHPC_EUSTALL_REPORT1_READ_PTR_MASK REG_GENMASK(15, 2) 24 + 25 + #define XEHPC_EUSTALL_CTRL XE_REG_MCR(0xe53c, XE_REG_OPTION_MASKED) 26 + #define EUSTALL_MOCS REG_GENMASK(9, 3) 27 + #define EUSTALL_SAMPLE_RATE REG_GENMASK(2, 0) 28 + 29 + #endif

+6 -1

drivers/gpu/drm/xe/regs/xe_gt_regs.h

··· 358 358 #define RENDER_AWAKE_STATUS REG_BIT(1) 359 359 #define MEDIA_SLICE0_AWAKE_STATUS REG_BIT(0) 360 360 361 + #define MISC_STATUS_0 XE_REG(0xa500) 362 + 361 363 #define FORCEWAKE_MEDIA_VDBOX(n) XE_REG(0xa540 + (n) * 4) 362 364 #define FORCEWAKE_MEDIA_VEBOX(n) XE_REG(0xa560 + (n) * 4) 363 365 #define FORCEWAKE_GSC XE_REG(0xa618) 364 366 367 + #define XELP_GARBCNTL XE_REG(0xb004) 368 + #define XELP_BUS_HASH_CTL_BIT_EXC REG_BIT(7) 369 + 365 370 #define XEHPC_LNCFMISCCFGREG0 XE_REG_MCR(0xb01c, XE_REG_OPTION_MASKED) 366 371 #define XEHPC_OVRLSCCC REG_BIT(0) 367 372 368 - /* L3 Cache Control */ 369 373 #define LNCFCMOCS_REG_COUNT 32 370 374 #define XELP_LNCFCMOCS(i) XE_REG(0xb020 + (i) * 4) 371 375 #define XEHP_LNCFCMOCS(i) XE_REG_MCR(0xb020 + (i) * 4) ··· 482 478 #define TDL_TSL_CHICKEN XE_REG_MCR(0xe4c4, XE_REG_OPTION_MASKED) 483 479 #define STK_ID_RESTRICT REG_BIT(12) 484 480 #define SLM_WMTP_RESTORE REG_BIT(11) 481 + #define RES_CHK_SPR_DIS REG_BIT(6) 485 482 486 483 #define ROW_CHICKEN XE_REG_MCR(0xe4f0, XE_REG_OPTION_MASKED) 487 484 #define UGM_BACKUP_MODE REG_BIT(13)

-4

drivers/gpu/drm/xe/regs/xe_regs.h

··· 7 7 8 8 #include "regs/xe_reg_defs.h" 9 9 10 - #define TIMESTAMP_OVERRIDE XE_REG(0x44074) 11 - #define TIMESTAMP_OVERRIDE_US_COUNTER_DENOMINATOR_MASK REG_GENMASK(15, 12) 12 - #define TIMESTAMP_OVERRIDE_US_COUNTER_DIVIDER_MASK REG_GENMASK(9, 0) 13 - 14 10 #define GU_CNTL_PROTECTED XE_REG(0x10100C) 15 11 #define DRIVERINT_FLR_DIS REG_BIT(31) 16 12

+13 -13

drivers/gpu/drm/xe/tests/xe_pci.c

··· 21 21 */ 22 22 void xe_call_for_each_graphics_ip(xe_graphics_fn xe_fn) 23 23 { 24 - const struct xe_graphics_desc *ip, *last = NULL; 24 + const struct xe_graphics_desc *desc, *last = NULL; 25 25 26 - for (int i = 0; i < ARRAY_SIZE(graphics_ip_map); i++) { 27 - ip = graphics_ip_map[i].ip; 28 - if (ip == last) 26 + for (int i = 0; i < ARRAY_SIZE(graphics_ips); i++) { 27 + desc = graphics_ips[i].desc; 28 + if (desc == last) 29 29 continue; 30 30 31 - xe_fn(ip); 32 - last = ip; 31 + xe_fn(desc); 32 + last = desc; 33 33 } 34 34 } 35 35 EXPORT_SYMBOL_IF_KUNIT(xe_call_for_each_graphics_ip); ··· 43 43 */ 44 44 void xe_call_for_each_media_ip(xe_media_fn xe_fn) 45 45 { 46 - const struct xe_media_desc *ip, *last = NULL; 46 + const struct xe_media_desc *desc, *last = NULL; 47 47 48 - for (int i = 0; i < ARRAY_SIZE(media_ip_map); i++) { 49 - ip = media_ip_map[i].ip; 50 - if (ip == last) 48 + for (int i = 0; i < ARRAY_SIZE(media_ips); i++) { 49 + desc = media_ips[i].desc; 50 + if (desc == last) 51 51 continue; 52 52 53 - xe_fn(ip); 54 - last = ip; 53 + xe_fn(desc); 54 + last = desc; 55 55 } 56 56 } 57 57 EXPORT_SYMBOL_IF_KUNIT(xe_call_for_each_media_ip); ··· 110 110 kunit_activate_static_stub(test, read_gmdid, fake_read_gmdid); 111 111 112 112 xe_info_init_early(xe, desc, subplatform_desc); 113 - xe_info_init(xe, desc->graphics, desc->media); 113 + xe_info_init(xe, desc); 114 114 115 115 return 0; 116 116 }

+54

drivers/gpu/drm/xe/xe_bo.c

··· 281 281 static void xe_evict_flags(struct ttm_buffer_object *tbo, 282 282 struct ttm_placement *placement) 283 283 { 284 + struct xe_bo *bo; 285 + 284 286 if (!xe_bo_is_xe_bo(tbo)) { 285 287 /* Don't handle scatter gather BOs */ 286 288 if (tbo->type == ttm_bo_type_sg) { ··· 290 288 return; 291 289 } 292 290 291 + *placement = sys_placement; 292 + return; 293 + } 294 + 295 + bo = ttm_to_xe_bo(tbo); 296 + if (bo->flags & XE_BO_FLAG_CPU_ADDR_MIRROR) { 293 297 *placement = sys_placement; 294 298 return; 295 299 } ··· 794 786 795 787 if ((move_lacks_source && !needs_clear)) { 796 788 ttm_bo_move_null(ttm_bo, new_mem); 789 + goto out; 790 + } 791 + 792 + if (!move_lacks_source && (bo->flags & XE_BO_FLAG_CPU_ADDR_MIRROR) && 793 + new_mem->mem_type == XE_PL_SYSTEM) { 794 + ret = xe_svm_bo_evict(bo); 795 + if (!ret) { 796 + drm_dbg(&xe->drm, "Evict system allocator BO success\n"); 797 + ttm_bo_move_null(ttm_bo, new_mem); 798 + } else { 799 + drm_dbg(&xe->drm, "Evict system allocator BO failed=%pe\n", 800 + ERR_PTR(ret)); 801 + } 802 + 797 803 goto out; 798 804 } 799 805 ··· 2463 2441 struct xe_file *xef = to_xe_file(file); 2464 2442 struct drm_xe_gem_create *args = data; 2465 2443 struct xe_vm *vm = NULL; 2444 + ktime_t end = 0; 2466 2445 struct xe_bo *bo; 2467 2446 unsigned int bo_flags; 2468 2447 u32 handle; ··· 2535 2512 vm = xe_vm_lookup(xef, args->vm_id); 2536 2513 if (XE_IOCTL_DBG(xe, !vm)) 2537 2514 return -ENOENT; 2515 + } 2516 + 2517 + retry: 2518 + if (vm) { 2538 2519 err = xe_vm_lock(vm, true); 2539 2520 if (err) 2540 2521 goto out_vm; ··· 2552 2525 2553 2526 if (IS_ERR(bo)) { 2554 2527 err = PTR_ERR(bo); 2528 + if (xe_vm_validate_should_retry(NULL, err, &end)) 2529 + goto retry; 2555 2530 goto out_vm; 2556 2531 } 2557 2532 ··· 2848 2819 2849 2820 llist_for_each_entry_safe(bo, next, freed, freed) 2850 2821 drm_gem_object_free(&bo->ttm.base.refcount); 2822 + } 2823 + 2824 + static void xe_bo_dev_work_func(struct work_struct *work) 2825 + { 2826 + struct xe_bo_dev *bo_dev = container_of(work, typeof(*bo_dev), async_free); 2827 + 2828 + xe_bo_put_commit(&bo_dev->async_list); 2829 + } 2830 + 2831 + /** 2832 + * xe_bo_dev_init() - Initialize BO dev to manage async BO freeing 2833 + * @bo_dev: The BO dev structure 2834 + */ 2835 + void xe_bo_dev_init(struct xe_bo_dev *bo_dev) 2836 + { 2837 + INIT_WORK(&bo_dev->async_free, xe_bo_dev_work_func); 2838 + } 2839 + 2840 + /** 2841 + * xe_bo_dev_fini() - Finalize BO dev managing async BO freeing 2842 + * @bo_dev: The BO dev structure 2843 + */ 2844 + void xe_bo_dev_fini(struct xe_bo_dev *bo_dev) 2845 + { 2846 + flush_work(&bo_dev->async_free); 2851 2847 } 2852 2848 2853 2849 void xe_bo_put(struct xe_bo *bo)

+20

drivers/gpu/drm/xe/xe_bo.h

··· 47 47 XE_BO_FLAG_GGTT1 | \ 48 48 XE_BO_FLAG_GGTT2 | \ 49 49 XE_BO_FLAG_GGTT3) 50 + #define XE_BO_FLAG_CPU_ADDR_MIRROR BIT(22) 50 51 51 52 /* this one is trigger internally only */ 52 53 #define XE_BO_FLAG_INTERNAL_TEST BIT(30) ··· 345 344 } 346 345 347 346 void xe_bo_put_commit(struct llist_head *deferred); 347 + 348 + /** 349 + * xe_bo_put_async() - Put BO async 350 + * @bo: The bo to put. 351 + * 352 + * Put BO async, the final put is deferred to a worker to exit an IRQ context. 353 + */ 354 + static inline void 355 + xe_bo_put_async(struct xe_bo *bo) 356 + { 357 + struct xe_bo_dev *bo_device = &xe_bo_device(bo)->bo_device; 358 + 359 + if (xe_bo_put_deferred(bo, &bo_device->async_list)) 360 + schedule_work(&bo_device->async_free); 361 + } 362 + 363 + void xe_bo_dev_init(struct xe_bo_dev *bo_device); 364 + 365 + void xe_bo_dev_fini(struct xe_bo_dev *bo_device); 348 366 349 367 struct sg_table *xe_bo_sg(struct xe_bo *bo); 350 368

+4

drivers/gpu/drm/xe/xe_bo_types.h

··· 8 8 9 9 #include <linux/iosys-map.h> 10 10 11 + #include <drm/drm_gpusvm.h> 11 12 #include <drm/ttm/ttm_bo.h> 12 13 #include <drm/ttm/ttm_device.h> 13 14 #include <drm/ttm/ttm_placement.h> ··· 80 79 * WB. 81 80 */ 82 81 u16 cpu_caching; 82 + 83 + /** @devmem_allocation: SVM device memory allocation */ 84 + struct drm_gpusvm_devmem devmem_allocation; 83 85 84 86 /** @vram_userfault_link: Link into @mem_access.vram_userfault.list */ 85 87 struct list_head vram_userfault_link;

+4 -4

drivers/gpu/drm/xe/xe_devcoredump.c

··· 237 237 238 238 /* 239 239 * NB: Despite passing a GFP_ flags parameter here, more allocations are done 240 - * internally using GFP_KERNEL expliictly. Hence this call must be in the worker 240 + * internally using GFP_KERNEL explicitly. Hence this call must be in the worker 241 241 * thread and not in the initial capture call. 242 242 */ 243 243 dev_coredumpm_timeout(gt_to_xe(ss->gt)->drm.dev, THIS_MODULE, coredump, 0, GFP_KERNEL, ··· 423 423 if (size & 3) 424 424 drm_printf(p, "Size not word aligned: %zu", size); 425 425 if (offset & 3) 426 - drm_printf(p, "Offset not word aligned: %zu", size); 426 + drm_printf(p, "Offset not word aligned: %zu", offset); 427 427 428 428 line_buff = kzalloc(DMESG_MAX_LINE_LEN, GFP_KERNEL); 429 - if (IS_ERR_OR_NULL(line_buff)) { 430 - drm_printf(p, "Failed to allocate line buffer: %pe", line_buff); 429 + if (!line_buff) { 430 + drm_printf(p, "Failed to allocate line buffer\n"); 431 431 return; 432 432 } 433 433

+15 -86

drivers/gpu/drm/xe/xe_device.c

··· 54 54 #include "xe_query.h" 55 55 #include "xe_shrinker.h" 56 56 #include "xe_sriov.h" 57 - #include "xe_survivability_mode.h" 58 57 #include "xe_tile.h" 59 58 #include "xe_ttm_stolen_mgr.h" 60 59 #include "xe_ttm_sys_mgr.h" ··· 64 65 #include "xe_wa.h" 65 66 66 67 #include <generated/xe_wa_oob.h> 67 - 68 - struct xe_device_remove_action { 69 - struct list_head node; 70 - void (*action)(void *); 71 - void *data; 72 - }; 73 68 74 69 static int xe_file_open(struct drm_device *dev, struct drm_file *file) 75 70 { ··· 388 395 { 389 396 struct xe_device *xe = to_xe_device(dev); 390 397 398 + xe_bo_dev_fini(&xe->bo_device); 399 + 391 400 if (xe->preempt_fence_wq) 392 401 destroy_workqueue(xe->preempt_fence_wq); 393 402 ··· 430 435 if (WARN_ON(err)) 431 436 goto err; 432 437 438 + xe_bo_dev_init(&xe->bo_device); 433 439 err = drmm_add_action_or_reset(&xe->drm, xe_device_destroy, NULL); 434 440 if (err) 435 441 goto err; ··· 671 675 } 672 676 ALLOW_ERROR_INJECTION(wait_for_lmem_ready, ERRNO); /* See xe_pci_probe() */ 673 677 674 - static void update_device_info(struct xe_device *xe) 678 + static void sriov_update_device_info(struct xe_device *xe) 675 679 { 676 680 /* disable features that are not available/applicable to VFs */ 677 681 if (IS_SRIOV_VF(xe)) { ··· 702 706 703 707 xe_sriov_probe_early(xe); 704 708 705 - update_device_info(xe); 709 + sriov_update_device_info(xe); 706 710 707 711 err = xe_pcode_probe_early(xe); 708 - if (err) { 709 - if (xe_survivability_mode_required(xe)) 710 - xe_survivability_mode_init(xe); 711 - 712 + if (err) 712 713 return err; 713 - } 714 714 715 715 err = wait_for_lmem_ready(xe); 716 716 if (err) ··· 752 760 int err; 753 761 u8 id; 754 762 755 - xe->probing = true; 756 - INIT_LIST_HEAD(&xe->remove_action_list); 757 - 758 763 xe_pat_init_early(xe); 759 764 760 765 err = xe_sriov_init(xe); ··· 759 770 return err; 760 771 761 772 xe->info.mem_region_mask = 1; 773 + 762 774 err = xe_set_dma_info(xe); 763 775 if (err) 764 776 return err; ··· 768 778 if (err) 769 779 return err; 770 780 771 - xe_ttm_sys_mgr_init(xe); 781 + err = xe_ttm_sys_mgr_init(xe); 782 + if (err) 783 + return err; 772 784 773 785 for_each_gt(gt, xe, id) { 774 786 err = xe_gt_init_early(gt); ··· 865 873 return err; 866 874 } 867 875 868 - xe_heci_gsc_init(xe); 876 + err = xe_heci_gsc_init(xe); 877 + if (err) 878 + return err; 869 879 870 880 err = xe_oa_init(xe); 871 881 if (err) ··· 879 885 880 886 err = xe_pxp_init(xe); 881 887 if (err) 882 - goto err_remove_display; 888 + return err; 883 889 884 890 err = drm_dev_register(&xe->drm, 0); 885 891 if (err) 886 - goto err_remove_display; 892 + return err; 887 893 888 894 xe_display_register(xe); 889 895 ··· 906 912 907 913 xe_vsec_init(xe); 908 914 909 - xe->probing = false; 910 - 911 915 return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe); 912 916 913 917 err_unregister_display: 914 918 xe_display_unregister(xe); 915 - err_remove_display: 916 - xe_display_driver_remove(xe); 917 919 918 920 return err; 919 - } 920 - 921 - /** 922 - * xe_device_call_remove_actions - Call the remove actions 923 - * @xe: xe device instance 924 - * 925 - * This is only to be used by xe_pci and xe_device to call the remove actions 926 - * while removing the driver or handling probe failures. 927 - */ 928 - void xe_device_call_remove_actions(struct xe_device *xe) 929 - { 930 - struct xe_device_remove_action *ra, *tmp; 931 - 932 - list_for_each_entry_safe(ra, tmp, &xe->remove_action_list, node) { 933 - ra->action(ra->data); 934 - list_del(&ra->node); 935 - kfree(ra); 936 - } 937 - 938 - xe->probing = false; 939 - } 940 - 941 - /** 942 - * xe_device_add_action_or_reset - Add an action to run on driver removal 943 - * @xe: xe device instance 944 - * @action: Function that should be called on device remove 945 - * @data: Pointer to data passed to @action implementation 946 - * 947 - * This adds a custom action to the list of remove callbacks executed on device 948 - * remove, before any dev or drm managed resources are removed. This is only 949 - * needed if the action leads to component_del()/component_master_del() since 950 - * that is not compatible with devres cleanup. 951 - * 952 - * Returns: 0 on success or a negative error code on failure, in which case 953 - * @action is already called. 954 - */ 955 - int xe_device_add_action_or_reset(struct xe_device *xe, 956 - void (*action)(void *), void *data) 957 - { 958 - struct xe_device_remove_action *ra; 959 - 960 - drm_WARN_ON(&xe->drm, !xe->probing); 961 - 962 - ra = kmalloc(sizeof(*ra), GFP_KERNEL); 963 - if (!ra) { 964 - action(data); 965 - return -ENOMEM; 966 - } 967 - 968 - INIT_LIST_HEAD(&ra->node); 969 - ra->action = action; 970 - ra->data = data; 971 - list_add(&ra->node, &xe->remove_action_list); 972 - 973 - return 0; 974 921 } 975 922 976 923 void xe_device_remove(struct xe_device *xe) ··· 919 984 xe_display_unregister(xe); 920 985 921 986 drm_dev_unplug(&xe->drm); 922 - 923 - xe_display_driver_remove(xe); 924 - 925 - xe_heci_gsc_fini(xe); 926 - 927 - xe_device_call_remove_actions(xe); 928 987 } 929 988 930 989 void xe_device_shutdown(struct xe_device *xe)

-3

drivers/gpu/drm/xe/xe_device.h

··· 45 45 const struct pci_device_id *ent); 46 46 int xe_device_probe_early(struct xe_device *xe); 47 47 int xe_device_probe(struct xe_device *xe); 48 - int xe_device_add_action_or_reset(struct xe_device *xe, 49 - void (*action)(void *), void *data); 50 - void xe_device_call_remove_actions(struct xe_device *xe); 51 48 void xe_device_remove(struct xe_device *xe); 52 49 void xe_device_shutdown(struct xe_device *xe); 53 50

-6

drivers/gpu/drm/xe/xe_device_sysfs.c

··· 32 32 struct xe_device *xe = pdev_to_xe_device(pdev); 33 33 int ret; 34 34 35 - if (!xe) 36 - return -EINVAL; 37 - 38 35 xe_pm_runtime_get(xe); 39 36 ret = sysfs_emit(buf, "%d\n", xe->d3cold.vram_threshold); 40 37 xe_pm_runtime_put(xe); ··· 47 50 struct xe_device *xe = pdev_to_xe_device(pdev); 48 51 u32 vram_d3cold_threshold; 49 52 int ret; 50 - 51 - if (!xe) 52 - return -EINVAL; 53 53 54 54 ret = kstrtou32(buff, 0, &vram_d3cold_threshold); 55 55 if (ret)

+22 -14

drivers/gpu/drm/xe/xe_device_types.h

··· 10 10 11 11 #include <drm/drm_device.h> 12 12 #include <drm/drm_file.h> 13 + #include <drm/drm_pagemap.h> 13 14 #include <drm/ttm/ttm_device.h> 14 15 15 16 #include "xe_devcoredump_types.h" ··· 107 106 resource_size_t actual_physical_size; 108 107 /** @mapping: pointer to VRAM mappable space */ 109 108 void __iomem *mapping; 109 + /** @pagemap: Used to remap device memory as ZONE_DEVICE */ 110 + struct dev_pagemap pagemap; 111 + /** 112 + * @dpagemap: The struct drm_pagemap of the ZONE_DEVICE memory 113 + * pages of this tile. 114 + */ 115 + struct drm_pagemap dpagemap; 116 + /** 117 + * @hpa_base: base host physical address 118 + * 119 + * This is generated when remap device memory as ZONE_DEVICE 120 + */ 121 + resource_size_t hpa_base; 110 122 /** @ttm: VRAM TTM manager */ 111 123 struct xe_ttm_vram_mgr ttm; 112 124 }; ··· 445 431 struct xe_tile tiles[XE_MAX_TILES_PER_DEVICE]; 446 432 447 433 /** 448 - * @remove_action_list: list of actions to execute on device remove. 449 - * Use xe_device_add_remove_action() for that. Actions can only be added 450 - * during probe and are executed during the call from PCI subsystem to 451 - * remove the driver from the device. 452 - */ 453 - struct list_head remove_action_list; 454 - 455 - /** 456 - * @probing: cover the section in which @remove_action_list can be used 457 - * to post cleaning actions 458 - */ 459 - bool probing; 460 - 461 - /** 462 434 * @mem_access: keep track of memory access in the device, possibly 463 435 * triggering additional actions when they occur. 464 436 */ ··· 540 540 /** @wedged.mode: Mode controlled by kernel parameter and debugfs */ 541 541 int mode; 542 542 } wedged; 543 + 544 + /** @bo_device: Struct to control async free of BOs */ 545 + struct xe_bo_dev { 546 + /** @bo_device.async_free: Free worker */ 547 + struct work_struct async_free; 548 + /** @bo_device.async_list: List of BOs to be freed */ 549 + struct llist_head async_list; 550 + } bo_device; 543 551 544 552 /** @pmu: performance monitoring unit */ 545 553 struct xe_pmu pmu;

+960

drivers/gpu/drm/xe/xe_eu_stall.c

··· 1 + // SPDX-License-Identifier: MIT 2 + /* 3 + * Copyright © 2025 Intel Corporation 4 + */ 5 + 6 + #include <linux/anon_inodes.h> 7 + #include <linux/fs.h> 8 + #include <linux/poll.h> 9 + #include <linux/types.h> 10 + 11 + #include <drm/drm_drv.h> 12 + #include <generated/xe_wa_oob.h> 13 + #include <uapi/drm/xe_drm.h> 14 + 15 + #include "xe_bo.h" 16 + #include "xe_device.h" 17 + #include "xe_eu_stall.h" 18 + #include "xe_force_wake.h" 19 + #include "xe_gt_mcr.h" 20 + #include "xe_gt_printk.h" 21 + #include "xe_gt_topology.h" 22 + #include "xe_macros.h" 23 + #include "xe_observation.h" 24 + #include "xe_pm.h" 25 + #include "xe_trace.h" 26 + #include "xe_wa.h" 27 + 28 + #include "regs/xe_eu_stall_regs.h" 29 + #include "regs/xe_gt_regs.h" 30 + 31 + #define POLL_PERIOD_MS 5 32 + 33 + static size_t per_xecore_buf_size = SZ_512K; 34 + 35 + struct per_xecore_buf { 36 + /* Buffer vaddr */ 37 + u8 *vaddr; 38 + /* Write pointer */ 39 + u32 write; 40 + /* Read pointer */ 41 + u32 read; 42 + }; 43 + 44 + struct xe_eu_stall_data_stream { 45 + bool pollin; 46 + bool enabled; 47 + int wait_num_reports; 48 + int sampling_rate_mult; 49 + wait_queue_head_t poll_wq; 50 + size_t data_record_size; 51 + size_t per_xecore_buf_size; 52 + 53 + struct xe_gt *gt; 54 + struct xe_bo *bo; 55 + struct per_xecore_buf *xecore_buf; 56 + struct { 57 + bool reported_to_user; 58 + xe_dss_mask_t mask; 59 + } data_drop; 60 + struct delayed_work buf_poll_work; 61 + }; 62 + 63 + struct xe_eu_stall_gt { 64 + /* Lock to protect stream */ 65 + struct mutex stream_lock; 66 + /* EU stall data stream */ 67 + struct xe_eu_stall_data_stream *stream; 68 + /* Workqueue to schedule buffer pointers polling work */ 69 + struct workqueue_struct *buf_ptr_poll_wq; 70 + }; 71 + 72 + /** 73 + * struct eu_stall_open_properties - EU stall sampling properties received 74 + * from user space at open. 75 + * @sampling_rate_mult: EU stall sampling rate multiplier. 76 + * HW will sample every (sampling_rate_mult x 251) cycles. 77 + * @wait_num_reports: Minimum number of EU stall data reports to unblock poll(). 78 + * @gt: GT on which EU stall data will be captured. 79 + */ 80 + struct eu_stall_open_properties { 81 + int sampling_rate_mult; 82 + int wait_num_reports; 83 + struct xe_gt *gt; 84 + }; 85 + 86 + /* 87 + * EU stall data format for PVC 88 + */ 89 + struct xe_eu_stall_data_pvc { 90 + __u64 ip_addr:29; /* Bits 0 to 28 */ 91 + __u64 active_count:8; /* Bits 29 to 36 */ 92 + __u64 other_count:8; /* Bits 37 to 44 */ 93 + __u64 control_count:8; /* Bits 45 to 52 */ 94 + __u64 pipestall_count:8; /* Bits 53 to 60 */ 95 + __u64 send_count:8; /* Bits 61 to 68 */ 96 + __u64 dist_acc_count:8; /* Bits 69 to 76 */ 97 + __u64 sbid_count:8; /* Bits 77 to 84 */ 98 + __u64 sync_count:8; /* Bits 85 to 92 */ 99 + __u64 inst_fetch_count:8; /* Bits 93 to 100 */ 100 + __u64 unused_bits:27; 101 + __u64 unused[6]; 102 + } __packed; 103 + 104 + /* 105 + * EU stall data format for Xe2 arch GPUs (LNL, BMG). 106 + */ 107 + struct xe_eu_stall_data_xe2 { 108 + __u64 ip_addr:29; /* Bits 0 to 28 */ 109 + __u64 tdr_count:8; /* Bits 29 to 36 */ 110 + __u64 other_count:8; /* Bits 37 to 44 */ 111 + __u64 control_count:8; /* Bits 45 to 52 */ 112 + __u64 pipestall_count:8; /* Bits 53 to 60 */ 113 + __u64 send_count:8; /* Bits 61 to 68 */ 114 + __u64 dist_acc_count:8; /* Bits 69 to 76 */ 115 + __u64 sbid_count:8; /* Bits 77 to 84 */ 116 + __u64 sync_count:8; /* Bits 85 to 92 */ 117 + __u64 inst_fetch_count:8; /* Bits 93 to 100 */ 118 + __u64 active_count:8; /* Bits 101 to 108 */ 119 + __u64 ex_id:3; /* Bits 109 to 111 */ 120 + __u64 end_flag:1; /* Bit 112 */ 121 + __u64 unused_bits:15; 122 + __u64 unused[6]; 123 + } __packed; 124 + 125 + const u64 eu_stall_sampling_rates[] = {251, 251 * 2, 251 * 3, 251 * 4, 251 * 5, 251 * 6, 251 * 7}; 126 + 127 + /** 128 + * xe_eu_stall_get_sampling_rates - get EU stall sampling rates information. 129 + * 130 + * @num_rates: Pointer to a u32 to return the number of sampling rates. 131 + * @rates: double u64 pointer to point to an array of sampling rates. 132 + * 133 + * Stores the number of sampling rates and pointer to the array of 134 + * sampling rates in the input pointers. 135 + * 136 + * Returns: Size of the EU stall sampling rates array. 137 + */ 138 + size_t xe_eu_stall_get_sampling_rates(u32 *num_rates, const u64 **rates) 139 + { 140 + *num_rates = ARRAY_SIZE(eu_stall_sampling_rates); 141 + *rates = eu_stall_sampling_rates; 142 + 143 + return sizeof(eu_stall_sampling_rates); 144 + } 145 + 146 + /** 147 + * xe_eu_stall_get_per_xecore_buf_size - get per XeCore buffer size. 148 + * 149 + * Returns: The per XeCore buffer size used to allocate the per GT 150 + * EU stall data buffer. 151 + */ 152 + size_t xe_eu_stall_get_per_xecore_buf_size(void) 153 + { 154 + return per_xecore_buf_size; 155 + } 156 + 157 + /** 158 + * xe_eu_stall_data_record_size - get EU stall data record size. 159 + * 160 + * @xe: Pointer to a Xe device. 161 + * 162 + * Returns: EU stall data record size. 163 + */ 164 + size_t xe_eu_stall_data_record_size(struct xe_device *xe) 165 + { 166 + size_t record_size = 0; 167 + 168 + if (xe->info.platform == XE_PVC) 169 + record_size = sizeof(struct xe_eu_stall_data_pvc); 170 + else if (GRAPHICS_VER(xe) >= 20) 171 + record_size = sizeof(struct xe_eu_stall_data_xe2); 172 + 173 + xe_assert(xe, is_power_of_2(record_size)); 174 + 175 + return record_size; 176 + } 177 + 178 + /** 179 + * num_data_rows - Return the number of EU stall data rows of 64B each 180 + * for a given data size. 181 + * 182 + * @data_size: EU stall data size 183 + */ 184 + static u32 num_data_rows(u32 data_size) 185 + { 186 + return data_size >> 6; 187 + } 188 + 189 + static void xe_eu_stall_fini(void *arg) 190 + { 191 + struct xe_gt *gt = arg; 192 + 193 + destroy_workqueue(gt->eu_stall->buf_ptr_poll_wq); 194 + mutex_destroy(&gt->eu_stall->stream_lock); 195 + kfree(gt->eu_stall); 196 + } 197 + 198 + /** 199 + * xe_eu_stall_init() - Allocate and initialize GT level EU stall data 200 + * structure xe_eu_stall_gt within struct xe_gt. 201 + * 202 + * @gt: GT being initialized. 203 + * 204 + * Returns: zero on success or a negative error code. 205 + */ 206 + int xe_eu_stall_init(struct xe_gt *gt) 207 + { 208 + struct xe_device *xe = gt_to_xe(gt); 209 + int ret; 210 + 211 + gt->eu_stall = kzalloc(sizeof(*gt->eu_stall), GFP_KERNEL); 212 + if (!gt->eu_stall) { 213 + ret = -ENOMEM; 214 + goto exit; 215 + } 216 + 217 + mutex_init(&gt->eu_stall->stream_lock); 218 + 219 + gt->eu_stall->buf_ptr_poll_wq = alloc_ordered_workqueue("xe_eu_stall", 0); 220 + if (!gt->eu_stall->buf_ptr_poll_wq) { 221 + ret = -ENOMEM; 222 + goto exit_free; 223 + } 224 + 225 + ret = devm_add_action_or_reset(xe->drm.dev, xe_eu_stall_fini, gt); 226 + if (ret) 227 + goto exit_destroy; 228 + 229 + return 0; 230 + exit_destroy: 231 + destroy_workqueue(gt->eu_stall->buf_ptr_poll_wq); 232 + exit_free: 233 + mutex_destroy(&gt->eu_stall->stream_lock); 234 + kfree(gt->eu_stall); 235 + exit: 236 + return ret; 237 + } 238 + 239 + static int set_prop_eu_stall_sampling_rate(struct xe_device *xe, u64 value, 240 + struct eu_stall_open_properties *props) 241 + { 242 + value = div_u64(value, 251); 243 + if (value == 0 || value > 7) { 244 + drm_dbg(&xe->drm, "Invalid EU stall sampling rate %llu\n", value); 245 + return -EINVAL; 246 + } 247 + props->sampling_rate_mult = value; 248 + return 0; 249 + } 250 + 251 + static int set_prop_eu_stall_wait_num_reports(struct xe_device *xe, u64 value, 252 + struct eu_stall_open_properties *props) 253 + { 254 + props->wait_num_reports = value; 255 + 256 + return 0; 257 + } 258 + 259 + static int set_prop_eu_stall_gt_id(struct xe_device *xe, u64 value, 260 + struct eu_stall_open_properties *props) 261 + { 262 + if (value >= xe->info.gt_count) { 263 + drm_dbg(&xe->drm, "Invalid GT ID %llu for EU stall sampling\n", value); 264 + return -EINVAL; 265 + } 266 + props->gt = xe_device_get_gt(xe, value); 267 + return 0; 268 + } 269 + 270 + typedef int (*set_eu_stall_property_fn)(struct xe_device *xe, u64 value, 271 + struct eu_stall_open_properties *props); 272 + 273 + static const set_eu_stall_property_fn xe_set_eu_stall_property_funcs[] = { 274 + [DRM_XE_EU_STALL_PROP_SAMPLE_RATE] = set_prop_eu_stall_sampling_rate, 275 + [DRM_XE_EU_STALL_PROP_WAIT_NUM_REPORTS] = set_prop_eu_stall_wait_num_reports, 276 + [DRM_XE_EU_STALL_PROP_GT_ID] = set_prop_eu_stall_gt_id, 277 + }; 278 + 279 + static int xe_eu_stall_user_ext_set_property(struct xe_device *xe, u64 extension, 280 + struct eu_stall_open_properties *props) 281 + { 282 + u64 __user *address = u64_to_user_ptr(extension); 283 + struct drm_xe_ext_set_property ext; 284 + int err; 285 + u32 idx; 286 + 287 + err = __copy_from_user(&ext, address, sizeof(ext)); 288 + if (XE_IOCTL_DBG(xe, err)) 289 + return -EFAULT; 290 + 291 + if (XE_IOCTL_DBG(xe, ext.property >= ARRAY_SIZE(xe_set_eu_stall_property_funcs)) || 292 + XE_IOCTL_DBG(xe, ext.pad)) 293 + return -EINVAL; 294 + 295 + idx = array_index_nospec(ext.property, ARRAY_SIZE(xe_set_eu_stall_property_funcs)); 296 + return xe_set_eu_stall_property_funcs[idx](xe, ext.value, props); 297 + } 298 + 299 + typedef int (*xe_eu_stall_user_extension_fn)(struct xe_device *xe, u64 extension, 300 + struct eu_stall_open_properties *props); 301 + static const xe_eu_stall_user_extension_fn xe_eu_stall_user_extension_funcs[] = { 302 + [DRM_XE_EU_STALL_EXTENSION_SET_PROPERTY] = xe_eu_stall_user_ext_set_property, 303 + }; 304 + 305 + #define MAX_USER_EXTENSIONS 5 306 + static int xe_eu_stall_user_extensions(struct xe_device *xe, u64 extension, 307 + int ext_number, struct eu_stall_open_properties *props) 308 + { 309 + u64 __user *address = u64_to_user_ptr(extension); 310 + struct drm_xe_user_extension ext; 311 + int err; 312 + u32 idx; 313 + 314 + if (XE_IOCTL_DBG(xe, ext_number >= MAX_USER_EXTENSIONS)) 315 + return -E2BIG; 316 + 317 + err = __copy_from_user(&ext, address, sizeof(ext)); 318 + if (XE_IOCTL_DBG(xe, err)) 319 + return -EFAULT; 320 + 321 + if (XE_IOCTL_DBG(xe, ext.pad) || 322 + XE_IOCTL_DBG(xe, ext.name >= ARRAY_SIZE(xe_eu_stall_user_extension_funcs))) 323 + return -EINVAL; 324 + 325 + idx = array_index_nospec(ext.name, ARRAY_SIZE(xe_eu_stall_user_extension_funcs)); 326 + err = xe_eu_stall_user_extension_funcs[idx](xe, extension, props); 327 + if (XE_IOCTL_DBG(xe, err)) 328 + return err; 329 + 330 + if (ext.next_extension) 331 + return xe_eu_stall_user_extensions(xe, ext.next_extension, ++ext_number, props); 332 + 333 + return 0; 334 + } 335 + 336 + /** 337 + * buf_data_size - Calculate the number of bytes in a circular buffer 338 + * given the read and write pointers and the size of 339 + * the buffer. 340 + * 341 + * @buf_size: Size of the circular buffer 342 + * @read_ptr: Read pointer with an additional overflow bit 343 + * @write_ptr: Write pointer with an additional overflow bit 344 + * 345 + * Since the read and write pointers have an additional overflow bit, 346 + * this function calculates the offsets from the pointers and use the 347 + * offsets to calculate the data size in the buffer. 348 + * 349 + * Returns: number of bytes of data in the buffer 350 + */ 351 + static u32 buf_data_size(size_t buf_size, u32 read_ptr, u32 write_ptr) 352 + { 353 + u32 read_offset, write_offset, size = 0; 354 + 355 + if (read_ptr == write_ptr) 356 + goto exit; 357 + 358 + read_offset = read_ptr & (buf_size - 1); 359 + write_offset = write_ptr & (buf_size - 1); 360 + 361 + if (write_offset > read_offset) 362 + size = write_offset - read_offset; 363 + else 364 + size = buf_size - read_offset + write_offset; 365 + exit: 366 + return size; 367 + } 368 + 369 + /** 370 + * eu_stall_data_buf_poll - Poll for EU stall data in the buffer. 371 + * 372 + * @stream: xe EU stall data stream instance 373 + * 374 + * Returns: true if the EU stall buffer contains minimum stall data as 375 + * specified by the event report count, else false. 376 + */ 377 + static bool eu_stall_data_buf_poll(struct xe_eu_stall_data_stream *stream) 378 + { 379 + u32 read_ptr, write_ptr_reg, write_ptr, total_data = 0; 380 + u32 buf_size = stream->per_xecore_buf_size; 381 + struct per_xecore_buf *xecore_buf; 382 + struct xe_gt *gt = stream->gt; 383 + bool min_data_present = false; 384 + u16 group, instance; 385 + unsigned int xecore; 386 + 387 + mutex_lock(&gt->eu_stall->stream_lock); 388 + for_each_dss_steering(xecore, gt, group, instance) { 389 + xecore_buf = &stream->xecore_buf[xecore]; 390 + read_ptr = xecore_buf->read; 391 + write_ptr_reg = xe_gt_mcr_unicast_read(gt, XEHPC_EUSTALL_REPORT, 392 + group, instance); 393 + write_ptr = REG_FIELD_GET(XEHPC_EUSTALL_REPORT_WRITE_PTR_MASK, write_ptr_reg); 394 + write_ptr <<= 6; 395 + write_ptr &= ((buf_size << 1) - 1); 396 + if (!min_data_present) { 397 + total_data += buf_data_size(buf_size, read_ptr, write_ptr); 398 + if (num_data_rows(total_data) >= stream->wait_num_reports) 399 + min_data_present = true; 400 + } 401 + if (write_ptr_reg & XEHPC_EUSTALL_REPORT_OVERFLOW_DROP) 402 + set_bit(xecore, stream->data_drop.mask); 403 + xecore_buf->write = write_ptr; 404 + } 405 + mutex_unlock(&gt->eu_stall->stream_lock); 406 + 407 + return min_data_present; 408 + } 409 + 410 + static void clear_dropped_eviction_line_bit(struct xe_gt *gt, u16 group, u16 instance) 411 + { 412 + struct xe_device *xe = gt_to_xe(gt); 413 + u32 write_ptr_reg; 414 + 415 + /* On PVC, the overflow bit has to be cleared by writing 1 to it. 416 + * On Xe2 and later GPUs, the bit has to be cleared by writing 0 to it. 417 + */ 418 + if (GRAPHICS_VER(xe) >= 20) 419 + write_ptr_reg = _MASKED_BIT_DISABLE(XEHPC_EUSTALL_REPORT_OVERFLOW_DROP); 420 + else 421 + write_ptr_reg = _MASKED_BIT_ENABLE(XEHPC_EUSTALL_REPORT_OVERFLOW_DROP); 422 + 423 + xe_gt_mcr_unicast_write(gt, XEHPC_EUSTALL_REPORT, write_ptr_reg, group, instance); 424 + } 425 + 426 + static int xe_eu_stall_data_buf_read(struct xe_eu_stall_data_stream *stream, 427 + char __user *buf, size_t count, 428 + size_t *total_data_size, struct xe_gt *gt, 429 + u16 group, u16 instance, unsigned int xecore) 430 + { 431 + size_t read_data_size, copy_size, buf_size; 432 + u32 read_ptr_reg, read_ptr, write_ptr; 433 + u8 *xecore_start_vaddr, *read_vaddr; 434 + struct per_xecore_buf *xecore_buf; 435 + u32 read_offset, write_offset; 436 + 437 + /* Hardware increments the read and write pointers such that they can 438 + * overflow into one additional bit. For example, a 256KB size buffer 439 + * offset pointer needs 18 bits. But HW uses 19 bits for the read and 440 + * write pointers. This technique avoids wasting a slot in the buffer. 441 + * Read and write offsets are calculated from the pointers in order to 442 + * check if the write pointer has wrapped around the array. 443 + */ 444 + xecore_buf = &stream->xecore_buf[xecore]; 445 + xecore_start_vaddr = xecore_buf->vaddr; 446 + read_ptr = xecore_buf->read; 447 + write_ptr = xecore_buf->write; 448 + buf_size = stream->per_xecore_buf_size; 449 + 450 + read_data_size = buf_data_size(buf_size, read_ptr, write_ptr); 451 + /* Read only the data that the user space buffer can accommodate */ 452 + read_data_size = min_t(size_t, count - *total_data_size, read_data_size); 453 + if (read_data_size == 0) 454 + goto exit_drop; 455 + 456 + read_offset = read_ptr & (buf_size - 1); 457 + write_offset = write_ptr & (buf_size - 1); 458 + read_vaddr = xecore_start_vaddr + read_offset; 459 + 460 + if (write_offset > read_offset) { 461 + if (copy_to_user(buf + *total_data_size, read_vaddr, read_data_size)) 462 + return -EFAULT; 463 + } else { 464 + if (read_data_size >= buf_size - read_offset) 465 + copy_size = buf_size - read_offset; 466 + else 467 + copy_size = read_data_size; 468 + if (copy_to_user(buf + *total_data_size, read_vaddr, copy_size)) 469 + return -EFAULT; 470 + if (copy_to_user(buf + *total_data_size + copy_size, 471 + xecore_start_vaddr, read_data_size - copy_size)) 472 + return -EFAULT; 473 + } 474 + 475 + *total_data_size += read_data_size; 476 + read_ptr += read_data_size; 477 + 478 + /* Read pointer can overflow into one additional bit */ 479 + read_ptr &= (buf_size << 1) - 1; 480 + read_ptr_reg = REG_FIELD_PREP(XEHPC_EUSTALL_REPORT1_READ_PTR_MASK, (read_ptr >> 6)); 481 + read_ptr_reg = _MASKED_FIELD(XEHPC_EUSTALL_REPORT1_READ_PTR_MASK, read_ptr_reg); 482 + xe_gt_mcr_unicast_write(gt, XEHPC_EUSTALL_REPORT1, read_ptr_reg, group, instance); 483 + xecore_buf->read = read_ptr; 484 + trace_xe_eu_stall_data_read(group, instance, read_ptr, write_ptr, 485 + read_data_size, *total_data_size); 486 + exit_drop: 487 + /* Clear drop bit (if set) after any data was read or if the buffer was empty. 488 + * Drop bit can be set even if the buffer is empty as the buffer may have been emptied 489 + * in the previous read() and the data drop bit was set during the previous read(). 490 + */ 491 + if (test_bit(xecore, stream->data_drop.mask)) { 492 + clear_dropped_eviction_line_bit(gt, group, instance); 493 + clear_bit(xecore, stream->data_drop.mask); 494 + } 495 + return 0; 496 + } 497 + 498 + /** 499 + * xe_eu_stall_stream_read_locked - copy EU stall counters data from the 500 + * per xecore buffers to the userspace buffer 501 + * @stream: A stream opened for EU stall count metrics 502 + * @file: An xe EU stall data stream file 503 + * @buf: destination buffer given by userspace 504 + * @count: the number of bytes userspace wants to read 505 + * 506 + * Returns: Number of bytes copied or a negative error code 507 + * If we've successfully copied any data then reporting that takes 508 + * precedence over any internal error status, so the data isn't lost. 509 + */ 510 + static ssize_t xe_eu_stall_stream_read_locked(struct xe_eu_stall_data_stream *stream, 511 + struct file *file, char __user *buf, 512 + size_t count) 513 + { 514 + struct xe_gt *gt = stream->gt; 515 + size_t total_size = 0; 516 + u16 group, instance; 517 + unsigned int xecore; 518 + int ret = 0; 519 + 520 + if (bitmap_weight(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS)) { 521 + if (!stream->data_drop.reported_to_user) { 522 + stream->data_drop.reported_to_user = true; 523 + xe_gt_dbg(gt, "EU stall data dropped in XeCores: %*pb\n", 524 + XE_MAX_DSS_FUSE_BITS, stream->data_drop.mask); 525 + return -EIO; 526 + } 527 + stream->data_drop.reported_to_user = false; 528 + } 529 + 530 + for_each_dss_steering(xecore, gt, group, instance) { 531 + ret = xe_eu_stall_data_buf_read(stream, buf, count, &total_size, 532 + gt, group, instance, xecore); 533 + if (ret || count == total_size) 534 + break; 535 + } 536 + return total_size ?: (ret ?: -EAGAIN); 537 + } 538 + 539 + /* 540 + * Userspace must enable the EU stall stream with DRM_XE_OBSERVATION_IOCTL_ENABLE 541 + * before calling read(). 542 + * 543 + * Returns: The number of bytes copied or a negative error code on failure. 544 + * -EIO if HW drops any EU stall data when the buffer is full. 545 + */ 546 + static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf, 547 + size_t count, loff_t *ppos) 548 + { 549 + struct xe_eu_stall_data_stream *stream = file->private_data; 550 + struct xe_gt *gt = stream->gt; 551 + ssize_t ret, aligned_count; 552 + 553 + aligned_count = ALIGN_DOWN(count, stream->data_record_size); 554 + if (aligned_count == 0) 555 + return -EINVAL; 556 + 557 + if (!stream->enabled) { 558 + xe_gt_dbg(gt, "EU stall data stream not enabled to read\n"); 559 + return -EINVAL; 560 + } 561 + 562 + if (!(file->f_flags & O_NONBLOCK)) { 563 + do { 564 + ret = wait_event_interruptible(stream->poll_wq, stream->pollin); 565 + if (ret) 566 + return -EINTR; 567 + 568 + mutex_lock(&gt->eu_stall->stream_lock); 569 + ret = xe_eu_stall_stream_read_locked(stream, file, buf, aligned_count); 570 + mutex_unlock(&gt->eu_stall->stream_lock); 571 + } while (ret == -EAGAIN); 572 + } else { 573 + mutex_lock(&gt->eu_stall->stream_lock); 574 + ret = xe_eu_stall_stream_read_locked(stream, file, buf, aligned_count); 575 + mutex_unlock(&gt->eu_stall->stream_lock); 576 + } 577 + 578 + /* 579 + * This may not work correctly if the user buffer is very small. 580 + * We don't want to block the next read() when there is data in the buffer 581 + * now, but couldn't be accommodated in the small user buffer. 582 + */ 583 + stream->pollin = false; 584 + 585 + return ret; 586 + } 587 + 588 + static void xe_eu_stall_stream_free(struct xe_eu_stall_data_stream *stream) 589 + { 590 + struct xe_gt *gt = stream->gt; 591 + 592 + gt->eu_stall->stream = NULL; 593 + kfree(stream); 594 + } 595 + 596 + static void xe_eu_stall_data_buf_destroy(struct xe_eu_stall_data_stream *stream) 597 + { 598 + xe_bo_unpin_map_no_vm(stream->bo); 599 + kfree(stream->xecore_buf); 600 + } 601 + 602 + static int xe_eu_stall_data_buf_alloc(struct xe_eu_stall_data_stream *stream, 603 + u16 last_xecore) 604 + { 605 + struct xe_tile *tile = stream->gt->tile; 606 + struct xe_bo *bo; 607 + u32 size; 608 + 609 + stream->xecore_buf = kcalloc(last_xecore, sizeof(*stream->xecore_buf), GFP_KERNEL); 610 + if (!stream->xecore_buf) 611 + return -ENOMEM; 612 + 613 + size = stream->per_xecore_buf_size * last_xecore; 614 + 615 + bo = xe_bo_create_pin_map_at_aligned(tile->xe, tile, NULL, 616 + size, ~0ull, ttm_bo_type_kernel, 617 + XE_BO_FLAG_SYSTEM | XE_BO_FLAG_GGTT, SZ_64); 618 + if (IS_ERR(bo)) { 619 + kfree(stream->xecore_buf); 620 + return PTR_ERR(bo); 621 + } 622 + 623 + XE_WARN_ON(!IS_ALIGNED(xe_bo_ggtt_addr(bo), SZ_64)); 624 + stream->bo = bo; 625 + 626 + return 0; 627 + } 628 + 629 + static int xe_eu_stall_stream_enable(struct xe_eu_stall_data_stream *stream) 630 + { 631 + u32 write_ptr_reg, write_ptr, read_ptr_reg, reg_value; 632 + struct per_xecore_buf *xecore_buf; 633 + struct xe_gt *gt = stream->gt; 634 + u16 group, instance; 635 + unsigned int fw_ref; 636 + int xecore; 637 + 638 + /* Take runtime pm ref and forcewake to disable RC6 */ 639 + xe_pm_runtime_get(gt_to_xe(gt)); 640 + fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_RENDER); 641 + if (!xe_force_wake_ref_has_domain(fw_ref, XE_FW_RENDER)) { 642 + xe_gt_err(gt, "Failed to get RENDER forcewake\n"); 643 + xe_pm_runtime_put(gt_to_xe(gt)); 644 + return -ETIMEDOUT; 645 + } 646 + 647 + if (XE_WA(gt, 22016596838)) 648 + xe_gt_mcr_multicast_write(gt, ROW_CHICKEN2, 649 + _MASKED_BIT_ENABLE(DISABLE_DOP_GATING)); 650 + 651 + for_each_dss_steering(xecore, gt, group, instance) { 652 + write_ptr_reg = xe_gt_mcr_unicast_read(gt, XEHPC_EUSTALL_REPORT, group, instance); 653 + /* Clear any drop bits set and not cleared in the previous session. */ 654 + if (write_ptr_reg & XEHPC_EUSTALL_REPORT_OVERFLOW_DROP) 655 + clear_dropped_eviction_line_bit(gt, group, instance); 656 + write_ptr = REG_FIELD_GET(XEHPC_EUSTALL_REPORT_WRITE_PTR_MASK, write_ptr_reg); 657 + read_ptr_reg = REG_FIELD_PREP(XEHPC_EUSTALL_REPORT1_READ_PTR_MASK, write_ptr); 658 + read_ptr_reg = _MASKED_FIELD(XEHPC_EUSTALL_REPORT1_READ_PTR_MASK, read_ptr_reg); 659 + /* Initialize the read pointer to the write pointer */ 660 + xe_gt_mcr_unicast_write(gt, XEHPC_EUSTALL_REPORT1, read_ptr_reg, group, instance); 661 + write_ptr <<= 6; 662 + write_ptr &= (stream->per_xecore_buf_size << 1) - 1; 663 + xecore_buf = &stream->xecore_buf[xecore]; 664 + xecore_buf->write = write_ptr; 665 + xecore_buf->read = write_ptr; 666 + } 667 + stream->data_drop.reported_to_user = false; 668 + bitmap_zero(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS); 669 + 670 + reg_value = _MASKED_FIELD(EUSTALL_MOCS | EUSTALL_SAMPLE_RATE, 671 + REG_FIELD_PREP(EUSTALL_MOCS, gt->mocs.uc_index << 1) | 672 + REG_FIELD_PREP(EUSTALL_SAMPLE_RATE, 673 + stream->sampling_rate_mult)); 674 + xe_gt_mcr_multicast_write(gt, XEHPC_EUSTALL_CTRL, reg_value); 675 + /* GGTT addresses can never be > 32 bits */ 676 + xe_gt_mcr_multicast_write(gt, XEHPC_EUSTALL_BASE_UPPER, 0); 677 + reg_value = xe_bo_ggtt_addr(stream->bo); 678 + reg_value |= REG_FIELD_PREP(XEHPC_EUSTALL_BASE_XECORE_BUF_SZ, 679 + stream->per_xecore_buf_size / SZ_256K); 680 + reg_value |= XEHPC_EUSTALL_BASE_ENABLE_SAMPLING; 681 + xe_gt_mcr_multicast_write(gt, XEHPC_EUSTALL_BASE, reg_value); 682 + 683 + return 0; 684 + } 685 + 686 + static void eu_stall_data_buf_poll_work_fn(struct work_struct *work) 687 + { 688 + struct xe_eu_stall_data_stream *stream = 689 + container_of(work, typeof(*stream), buf_poll_work.work); 690 + struct xe_gt *gt = stream->gt; 691 + 692 + if (eu_stall_data_buf_poll(stream)) { 693 + stream->pollin = true; 694 + wake_up(&stream->poll_wq); 695 + } 696 + queue_delayed_work(gt->eu_stall->buf_ptr_poll_wq, 697 + &stream->buf_poll_work, 698 + msecs_to_jiffies(POLL_PERIOD_MS)); 699 + } 700 + 701 + static int xe_eu_stall_stream_init(struct xe_eu_stall_data_stream *stream, 702 + struct eu_stall_open_properties *props) 703 + { 704 + unsigned int max_wait_num_reports, xecore, last_xecore, num_xecores; 705 + struct per_xecore_buf *xecore_buf; 706 + struct xe_gt *gt = stream->gt; 707 + xe_dss_mask_t all_xecores; 708 + u16 group, instance; 709 + u32 vaddr_offset; 710 + int ret; 711 + 712 + bitmap_or(all_xecores, gt->fuse_topo.g_dss_mask, gt->fuse_topo.c_dss_mask, 713 + XE_MAX_DSS_FUSE_BITS); 714 + num_xecores = bitmap_weight(all_xecores, XE_MAX_DSS_FUSE_BITS); 715 + last_xecore = xe_gt_topology_mask_last_dss(all_xecores) + 1; 716 + 717 + max_wait_num_reports = num_data_rows(per_xecore_buf_size * num_xecores); 718 + if (props->wait_num_reports == 0 || props->wait_num_reports > max_wait_num_reports) { 719 + xe_gt_dbg(gt, "Invalid EU stall event report count %u\n", 720 + props->wait_num_reports); 721 + xe_gt_dbg(gt, "Minimum event report count is 1, maximum is %u\n", 722 + max_wait_num_reports); 723 + return -EINVAL; 724 + } 725 + 726 + init_waitqueue_head(&stream->poll_wq); 727 + INIT_DELAYED_WORK(&stream->buf_poll_work, eu_stall_data_buf_poll_work_fn); 728 + stream->per_xecore_buf_size = per_xecore_buf_size; 729 + stream->sampling_rate_mult = props->sampling_rate_mult; 730 + stream->wait_num_reports = props->wait_num_reports; 731 + stream->data_record_size = xe_eu_stall_data_record_size(gt_to_xe(gt)); 732 + 733 + ret = xe_eu_stall_data_buf_alloc(stream, last_xecore); 734 + if (ret) 735 + return ret; 736 + 737 + for_each_dss_steering(xecore, gt, group, instance) { 738 + xecore_buf = &stream->xecore_buf[xecore]; 739 + vaddr_offset = xecore * stream->per_xecore_buf_size; 740 + xecore_buf->vaddr = stream->bo->vmap.vaddr + vaddr_offset; 741 + } 742 + return 0; 743 + } 744 + 745 + static __poll_t xe_eu_stall_stream_poll_locked(struct xe_eu_stall_data_stream *stream, 746 + struct file *file, poll_table *wait) 747 + { 748 + __poll_t events = 0; 749 + 750 + poll_wait(file, &stream->poll_wq, wait); 751 + 752 + if (stream->pollin) 753 + events |= EPOLLIN; 754 + 755 + return events; 756 + } 757 + 758 + static __poll_t xe_eu_stall_stream_poll(struct file *file, poll_table *wait) 759 + { 760 + struct xe_eu_stall_data_stream *stream = file->private_data; 761 + struct xe_gt *gt = stream->gt; 762 + __poll_t ret; 763 + 764 + mutex_lock(&gt->eu_stall->stream_lock); 765 + ret = xe_eu_stall_stream_poll_locked(stream, file, wait); 766 + mutex_unlock(&gt->eu_stall->stream_lock); 767 + 768 + return ret; 769 + } 770 + 771 + static int xe_eu_stall_enable_locked(struct xe_eu_stall_data_stream *stream) 772 + { 773 + struct xe_gt *gt = stream->gt; 774 + int ret = 0; 775 + 776 + if (stream->enabled) 777 + return ret; 778 + 779 + stream->enabled = true; 780 + 781 + ret = xe_eu_stall_stream_enable(stream); 782 + 783 + queue_delayed_work(gt->eu_stall->buf_ptr_poll_wq, 784 + &stream->buf_poll_work, 785 + msecs_to_jiffies(POLL_PERIOD_MS)); 786 + return ret; 787 + } 788 + 789 + static int xe_eu_stall_disable_locked(struct xe_eu_stall_data_stream *stream) 790 + { 791 + struct xe_gt *gt = stream->gt; 792 + 793 + if (!stream->enabled) 794 + return 0; 795 + 796 + stream->enabled = false; 797 + 798 + xe_gt_mcr_multicast_write(gt, XEHPC_EUSTALL_BASE, 0); 799 + 800 + cancel_delayed_work_sync(&stream->buf_poll_work); 801 + 802 + if (XE_WA(gt, 22016596838)) 803 + xe_gt_mcr_multicast_write(gt, ROW_CHICKEN2, 804 + _MASKED_BIT_DISABLE(DISABLE_DOP_GATING)); 805 + 806 + xe_force_wake_put(gt_to_fw(gt), XE_FW_RENDER); 807 + xe_pm_runtime_put(gt_to_xe(gt)); 808 + 809 + return 0; 810 + } 811 + 812 + static long xe_eu_stall_stream_ioctl_locked(struct xe_eu_stall_data_stream *stream, 813 + unsigned int cmd, unsigned long arg) 814 + { 815 + switch (cmd) { 816 + case DRM_XE_OBSERVATION_IOCTL_ENABLE: 817 + return xe_eu_stall_enable_locked(stream); 818 + case DRM_XE_OBSERVATION_IOCTL_DISABLE: 819 + return xe_eu_stall_disable_locked(stream); 820 + } 821 + 822 + return -EINVAL; 823 + } 824 + 825 + static long xe_eu_stall_stream_ioctl(struct file *file, unsigned int cmd, unsigned long arg) 826 + { 827 + struct xe_eu_stall_data_stream *stream = file->private_data; 828 + struct xe_gt *gt = stream->gt; 829 + long ret; 830 + 831 + mutex_lock(&gt->eu_stall->stream_lock); 832 + ret = xe_eu_stall_stream_ioctl_locked(stream, cmd, arg); 833 + mutex_unlock(&gt->eu_stall->stream_lock); 834 + 835 + return ret; 836 + } 837 + 838 + static int xe_eu_stall_stream_close(struct inode *inode, struct file *file) 839 + { 840 + struct xe_eu_stall_data_stream *stream = file->private_data; 841 + struct xe_gt *gt = stream->gt; 842 + 843 + drm_dev_put(&gt->tile->xe->drm); 844 + 845 + mutex_lock(&gt->eu_stall->stream_lock); 846 + xe_eu_stall_disable_locked(stream); 847 + xe_eu_stall_data_buf_destroy(stream); 848 + xe_eu_stall_stream_free(stream); 849 + mutex_unlock(&gt->eu_stall->stream_lock); 850 + 851 + return 0; 852 + } 853 + 854 + static const struct file_operations fops_eu_stall = { 855 + .owner = THIS_MODULE, 856 + .llseek = noop_llseek, 857 + .release = xe_eu_stall_stream_close, 858 + .poll = xe_eu_stall_stream_poll, 859 + .read = xe_eu_stall_stream_read, 860 + .unlocked_ioctl = xe_eu_stall_stream_ioctl, 861 + .compat_ioctl = xe_eu_stall_stream_ioctl, 862 + }; 863 + 864 + static int xe_eu_stall_stream_open_locked(struct drm_device *dev, 865 + struct eu_stall_open_properties *props, 866 + struct drm_file *file) 867 + { 868 + struct xe_eu_stall_data_stream *stream; 869 + struct xe_gt *gt = props->gt; 870 + unsigned long f_flags = 0; 871 + int ret, stream_fd; 872 + 873 + /* Only one session can be active at any time */ 874 + if (gt->eu_stall->stream) { 875 + xe_gt_dbg(gt, "EU stall sampling session already active\n"); 876 + return -EBUSY; 877 + } 878 + 879 + stream = kzalloc(sizeof(*stream), GFP_KERNEL); 880 + if (!stream) 881 + return -ENOMEM; 882 + 883 + gt->eu_stall->stream = stream; 884 + stream->gt = gt; 885 + 886 + ret = xe_eu_stall_stream_init(stream, props); 887 + if (ret) { 888 + xe_gt_dbg(gt, "EU stall stream init failed : %d\n", ret); 889 + goto err_free; 890 + } 891 + 892 + stream_fd = anon_inode_getfd("[xe_eu_stall]", &fops_eu_stall, stream, f_flags); 893 + if (stream_fd < 0) { 894 + ret = stream_fd; 895 + xe_gt_dbg(gt, "EU stall inode get fd failed : %d\n", ret); 896 + goto err_destroy; 897 + } 898 + 899 + /* Take a reference on the driver that will be kept with stream_fd 900 + * until its release. 901 + */ 902 + drm_dev_get(&gt->tile->xe->drm); 903 + 904 + return stream_fd; 905 + 906 + err_destroy: 907 + xe_eu_stall_data_buf_destroy(stream); 908 + err_free: 909 + xe_eu_stall_stream_free(stream); 910 + return ret; 911 + } 912 + 913 + /** 914 + * xe_eu_stall_stream_open - Open a xe EU stall data stream fd 915 + * 916 + * @dev: DRM device pointer 917 + * @data: pointer to first struct @drm_xe_ext_set_property in 918 + * the chain of input properties from the user space. 919 + * @file: DRM file pointer 920 + * 921 + * This function opens a EU stall data stream with input properties from 922 + * the user space. 923 + * 924 + * Returns: EU stall data stream fd on success or a negative error code. 925 + */ 926 + int xe_eu_stall_stream_open(struct drm_device *dev, u64 data, struct drm_file *file) 927 + { 928 + struct xe_device *xe = to_xe_device(dev); 929 + struct eu_stall_open_properties props = {}; 930 + int ret; 931 + 932 + if (!xe_eu_stall_supported_on_platform(xe)) { 933 + drm_dbg(&xe->drm, "EU stall monitoring is not supported on this platform\n"); 934 + return -ENODEV; 935 + } 936 + 937 + if (xe_observation_paranoid && !perfmon_capable()) { 938 + drm_dbg(&xe->drm, "Insufficient privileges for EU stall monitoring\n"); 939 + return -EACCES; 940 + } 941 + 942 + /* Initialize and set default values */ 943 + props.wait_num_reports = 1; 944 + props.sampling_rate_mult = 4; 945 + 946 + ret = xe_eu_stall_user_extensions(xe, data, 0, &props); 947 + if (ret) 948 + return ret; 949 + 950 + if (!props.gt) { 951 + drm_dbg(&xe->drm, "GT ID not provided for EU stall sampling\n"); 952 + return -EINVAL; 953 + } 954 + 955 + mutex_lock(&props.gt->eu_stall->stream_lock); 956 + ret = xe_eu_stall_stream_open_locked(dev, &props, file); 957 + mutex_unlock(&props.gt->eu_stall->stream_lock); 958 + 959 + return ret; 960 + }

+24

drivers/gpu/drm/xe/xe_eu_stall.h

··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2025 Intel Corporation 4 + */ 5 + 6 + #ifndef __XE_EU_STALL_H__ 7 + #define __XE_EU_STALL_H__ 8 + 9 + #include "xe_gt_types.h" 10 + 11 + size_t xe_eu_stall_get_per_xecore_buf_size(void); 12 + size_t xe_eu_stall_data_record_size(struct xe_device *xe); 13 + size_t xe_eu_stall_get_sampling_rates(u32 *num_rates, const u64 **rates); 14 + 15 + int xe_eu_stall_init(struct xe_gt *gt); 16 + int xe_eu_stall_stream_open(struct drm_device *dev, 17 + u64 data, 18 + struct drm_file *file); 19 + 20 + static inline bool xe_eu_stall_supported_on_platform(struct xe_device *xe) 21 + { 22 + return xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20; 23 + } 24 + #endif

+8 -3

drivers/gpu/drm/xe/xe_exec_queue.c

··· 203 203 __xe_exec_queue_free(q); 204 204 return ERR_PTR(err); 205 205 } 206 + ALLOW_ERROR_INJECTION(xe_exec_queue_create, ERRNO); 206 207 207 208 struct xe_exec_queue *xe_exec_queue_create_class(struct xe_device *xe, struct xe_gt *gt, 208 209 struct xe_vm *vm, ··· 605 604 struct xe_tile *tile; 606 605 struct xe_exec_queue *q = NULL; 607 606 u32 logical_mask; 607 + u32 flags = 0; 608 608 u32 id; 609 609 u32 len; 610 610 int err; 611 611 612 - if (XE_IOCTL_DBG(xe, args->flags) || 612 + if (XE_IOCTL_DBG(xe, args->flags & ~DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT) || 613 613 XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1])) 614 614 return -EINVAL; 615 615 ··· 627 625 if (XE_IOCTL_DBG(xe, eci[0].gt_id >= xe->info.gt_count)) 628 626 return -EINVAL; 629 627 628 + if (args->flags & DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT) 629 + flags |= EXEC_QUEUE_FLAG_LOW_LATENCY; 630 + 630 631 if (eci[0].engine_class == DRM_XE_ENGINE_CLASS_VM_BIND) { 631 632 if (XE_IOCTL_DBG(xe, args->width != 1) || 632 633 XE_IOCTL_DBG(xe, args->num_placements != 1) || ··· 638 633 639 634 for_each_tile(tile, xe, id) { 640 635 struct xe_exec_queue *new; 641 - u32 flags = EXEC_QUEUE_FLAG_VM; 642 636 637 + flags |= EXEC_QUEUE_FLAG_VM; 643 638 if (id) 644 639 flags |= EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD; 645 640 ··· 685 680 } 686 681 687 682 q = xe_exec_queue_create(xe, vm, logical_mask, 688 - args->width, hwe, 0, 683 + args->width, hwe, flags, 689 684 args->extensions); 690 685 up_read(&vm->lock); 691 686 xe_vm_put(vm);

+2

drivers/gpu/drm/xe/xe_exec_queue_types.h

··· 85 85 #define EXEC_QUEUE_FLAG_BIND_ENGINE_CHILD BIT(3) 86 86 /* kernel exec_queue only, set priority to highest level */ 87 87 #define EXEC_QUEUE_FLAG_HIGH_PRIORITY BIT(4) 88 + /* flag to indicate low latency hint to guc */ 89 + #define EXEC_QUEUE_FLAG_LOW_LATENCY BIT(5) 88 90 89 91 /** 90 92 * @flags: flags for this exec queue, should statically setup aside from ban

+3 -3

drivers/gpu/drm/xe/xe_gen_wa_oob.c

··· 28 28 "\n" \ 29 29 "#endif\n" 30 30 31 - static void print_usage(FILE *f) 31 + static void print_usage(FILE *f, const char *progname) 32 32 { 33 33 fprintf(f, "usage: %s <input-rule-file> <generated-c-source-file> <generated-c-header-file>\n", 34 - program_invocation_short_name); 34 + progname); 35 35 } 36 36 37 37 static void print_parse_error(const char *err_msg, const char *line, ··· 144 144 145 145 if (argc < 3) { 146 146 fprintf(stderr, "ERROR: wrong arguments\n"); 147 - print_usage(stderr); 147 + print_usage(stderr, argv[0]); 148 148 return 1; 149 149 } 150 150

+1 -1

drivers/gpu/drm/xe/xe_gsc_proxy.c

··· 490 490 491 491 gsc->proxy.component_added = true; 492 492 493 - return xe_device_add_action_or_reset(xe, xe_gsc_proxy_remove, gsc); 493 + return devm_add_action_or_reset(xe->drm.dev, xe_gsc_proxy_remove, gsc); 494 494 } 495 495 496 496 /**

+11 -2

drivers/gpu/drm/xe/xe_gt.c

··· 19 19 #include "xe_bb.h" 20 20 #include "xe_bo.h" 21 21 #include "xe_device.h" 22 + #include "xe_eu_stall.h" 22 23 #include "xe_exec_queue.h" 23 24 #include "xe_execlist.h" 24 25 #include "xe_force_wake.h" ··· 362 361 if (err) 363 362 return err; 364 363 365 - xe_wa_process_gt(gt); 364 + err = xe_tuning_init(gt); 365 + if (err) 366 + return err; 367 + 366 368 xe_wa_process_oob(gt); 367 - xe_tuning_process_gt(gt); 368 369 369 370 xe_force_wake_init_gt(gt, gt_to_fw(gt)); 370 371 spin_lock_init(&gt->global_invl_lock); ··· 453 450 } 454 451 455 452 xe_gt_mcr_set_implicit_defaults(gt); 453 + xe_wa_process_gt(gt); 454 + xe_tuning_process_gt(gt); 456 455 xe_reg_sr_apply_mmio(&gt->reg_sr, gt); 457 456 458 457 err = xe_gt_clock_init(gt); ··· 617 612 return err; 618 613 619 614 xe_gt_record_user_engines(gt); 615 + 616 + err = xe_eu_stall_init(gt); 617 + if (err) 618 + return err; 620 619 621 620 return 0; 622 621 }

+21 -32

drivers/gpu/drm/xe/xe_gt_clock.c

··· 12 12 #include "xe_assert.h" 13 13 #include "xe_device.h" 14 14 #include "xe_gt.h" 15 + #include "xe_gt_printk.h" 15 16 #include "xe_macros.h" 16 17 #include "xe_mmio.h" 17 - 18 - static u32 read_reference_ts_freq(struct xe_gt *gt) 19 - { 20 - u32 ts_override = xe_mmio_read32(&gt->mmio, TIMESTAMP_OVERRIDE); 21 - u32 base_freq, frac_freq; 22 - 23 - base_freq = REG_FIELD_GET(TIMESTAMP_OVERRIDE_US_COUNTER_DIVIDER_MASK, 24 - ts_override) + 1; 25 - base_freq *= 1000000; 26 - 27 - frac_freq = REG_FIELD_GET(TIMESTAMP_OVERRIDE_US_COUNTER_DENOMINATOR_MASK, 28 - ts_override); 29 - frac_freq = 1000000 / (frac_freq + 1); 30 - 31 - return base_freq + frac_freq; 32 - } 33 18 34 19 static u32 get_crystal_clock_freq(u32 rpm_config_reg) 35 20 { ··· 42 57 43 58 int xe_gt_clock_init(struct xe_gt *gt) 44 59 { 45 - u32 ctc_reg = xe_mmio_read32(&gt->mmio, CTC_MODE); 60 + u32 c0 = xe_mmio_read32(&gt->mmio, RPM_CONFIG0); 46 61 u32 freq = 0; 47 62 48 - /* Assuming gen11+ so assert this assumption is correct */ 49 - xe_gt_assert(gt, GRAPHICS_VER(gt_to_xe(gt)) >= 11); 63 + /* 64 + * CTC_MODE[0] = 1 is definitely not supported for Xe2 and later 65 + * platforms. In theory it could be a valid setting for pre-Xe2 66 + * platforms, but there's no documentation on how to properly handle 67 + * this case. Reading TIMESTAMP_OVERRIDE, as the driver attempted in 68 + * the past has been confirmed as incorrect by the hardware architects. 69 + * 70 + * For now just warn if we ever encounter hardware in the wild that 71 + * has this setting and move on as if it hadn't been set. 72 + */ 73 + if (xe_mmio_read32(&gt->mmio, CTC_MODE) & CTC_SOURCE_DIVIDE_LOGIC) 74 + xe_gt_warn(gt, "CTC_MODE[0] is set; this is unexpected and undocumented\n"); 50 75 51 - if (ctc_reg & CTC_SOURCE_DIVIDE_LOGIC) { 52 - freq = read_reference_ts_freq(gt); 53 - } else { 54 - u32 c0 = xe_mmio_read32(&gt->mmio, RPM_CONFIG0); 76 + freq = get_crystal_clock_freq(c0); 55 77 56 - freq = get_crystal_clock_freq(c0); 57 - 58 - /* 59 - * Now figure out how the command stream's timestamp 60 - * register increments from this frequency (it might 61 - * increment only every few clock cycle). 62 - */ 63 - freq >>= 3 - REG_FIELD_GET(RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK, c0); 64 - } 78 + /* 79 + * Now figure out how the command stream's timestamp 80 + * register increments from this frequency (it might 81 + * increment only every few clock cycle). 82 + */ 83 + freq >>= 3 - REG_FIELD_GET(RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK, c0); 65 84 66 85 gt->info.reference_clock = freq; 67 86 return 0;

+11

drivers/gpu/drm/xe/xe_gt_debugfs.c

··· 30 30 #include "xe_reg_sr.h" 31 31 #include "xe_reg_whitelist.h" 32 32 #include "xe_sriov.h" 33 + #include "xe_tuning.h" 33 34 #include "xe_uc_debugfs.h" 34 35 #include "xe_wa.h" 35 36 ··· 218 217 return 0; 219 218 } 220 219 220 + static int tunings(struct xe_gt *gt, struct drm_printer *p) 221 + { 222 + xe_pm_runtime_get(gt_to_xe(gt)); 223 + xe_tuning_dump(gt, p); 224 + xe_pm_runtime_put(gt_to_xe(gt)); 225 + 226 + return 0; 227 + } 228 + 221 229 static int pat(struct xe_gt *gt, struct drm_printer *p) 222 230 { 223 231 xe_pm_runtime_get(gt_to_xe(gt)); ··· 310 300 {"powergate_info", .show = xe_gt_debugfs_simple_show, .data = powergate_info}, 311 301 {"register-save-restore", .show = xe_gt_debugfs_simple_show, .data = register_save_restore}, 312 302 {"workarounds", .show = xe_gt_debugfs_simple_show, .data = workarounds}, 303 + {"tunings", .show = xe_gt_debugfs_simple_show, .data = tunings}, 313 304 {"pat", .show = xe_gt_debugfs_simple_show, .data = pat}, 314 305 {"mocs", .show = xe_gt_debugfs_simple_show, .data = mocs}, 315 306 {"default_lrc_rcs", .show = xe_gt_debugfs_simple_show, .data = rcs_default_lrc},

+14 -6

drivers/gpu/drm/xe/xe_gt_pagefault.c

··· 19 19 #include "xe_guc.h" 20 20 #include "xe_guc_ct.h" 21 21 #include "xe_migrate.h" 22 + #include "xe_svm.h" 22 23 #include "xe_trace_bo.h" 23 24 #include "xe_vm.h" 24 25 ··· 126 125 return 0; 127 126 } 128 127 129 - static int handle_vma_pagefault(struct xe_gt *gt, struct pagefault *pf, 130 - struct xe_vma *vma) 128 + static int handle_vma_pagefault(struct xe_gt *gt, struct xe_vma *vma, 129 + bool atomic) 131 130 { 132 131 struct xe_vm *vm = xe_vma_vm(vma); 133 132 struct xe_tile *tile = gt_to_tile(gt); ··· 135 134 struct dma_fence *fence; 136 135 ktime_t end = 0; 137 136 int err; 138 - bool atomic; 137 + 138 + lockdep_assert_held_write(&vm->lock); 139 139 140 140 xe_gt_stats_incr(gt, XE_GT_STATS_ID_VMA_PAGEFAULT_COUNT, 1); 141 - xe_gt_stats_incr(gt, XE_GT_STATS_ID_VMA_PAGEFAULT_BYTES, xe_vma_size(vma)); 141 + xe_gt_stats_incr(gt, XE_GT_STATS_ID_VMA_PAGEFAULT_KB, xe_vma_size(vma) / 1024); 142 142 143 143 trace_xe_vma_pagefault(vma); 144 - atomic = access_is_atomic(pf->access_type); 145 144 146 145 /* Check if VMA is valid */ 147 146 if (vma_is_valid(tile, vma) && !atomic) ··· 211 210 struct xe_vm *vm; 212 211 struct xe_vma *vma = NULL; 213 212 int err; 213 + bool atomic; 214 214 215 215 /* SW isn't expected to handle TRTT faults */ 216 216 if (pf->trva_fault) ··· 237 235 goto unlock_vm; 238 236 } 239 237 240 - err = handle_vma_pagefault(gt, pf, vma); 238 + atomic = access_is_atomic(pf->access_type); 239 + 240 + if (xe_vma_is_cpu_addr_mirror(vma)) 241 + err = xe_svm_handle_pagefault(vm, vma, gt_to_tile(gt), 242 + pf->page_addr, atomic); 243 + else 244 + err = handle_vma_pagefault(gt, vma, atomic); 241 245 242 246 unlock_vm: 243 247 if (!err)

-5

drivers/gpu/drm/xe/xe_gt_sriov_pf_service.c

··· 114 114 GT_VEBOX_VDBOX_DISABLE, /* _MMIO(0x9140) */ 115 115 CTC_MODE, /* _MMIO(0xa26c) */ 116 116 HUC_KERNEL_LOAD_INFO, /* _MMIO(0xc1dc) */ 117 - TIMESTAMP_OVERRIDE, /* _MMIO(0x44074) */ 118 117 }; 119 118 120 119 static const struct xe_reg ats_m_runtime_regs[] = { ··· 126 127 XEHP_GT_COMPUTE_DSS_ENABLE, /* _MMIO(0x9144) */ 127 128 CTC_MODE, /* _MMIO(0xa26c) */ 128 129 HUC_KERNEL_LOAD_INFO, /* _MMIO(0xc1dc) */ 129 - TIMESTAMP_OVERRIDE, /* _MMIO(0x44074) */ 130 130 }; 131 131 132 132 static const struct xe_reg pvc_runtime_regs[] = { ··· 138 140 XEHPC_GT_COMPUTE_DSS_ENABLE_EXT,/* _MMIO(0x9148) */ 139 141 CTC_MODE, /* _MMIO(0xA26C) */ 140 142 HUC_KERNEL_LOAD_INFO, /* _MMIO(0xc1dc) */ 141 - TIMESTAMP_OVERRIDE, /* _MMIO(0x44074) */ 142 143 }; 143 144 144 145 static const struct xe_reg ver_1270_runtime_regs[] = { ··· 152 155 XEHPC_GT_COMPUTE_DSS_ENABLE_EXT,/* _MMIO(0x9148) */ 153 156 CTC_MODE, /* _MMIO(0xa26c) */ 154 157 HUC_KERNEL_LOAD_INFO, /* _MMIO(0xc1dc) */ 155 - TIMESTAMP_OVERRIDE, /* _MMIO(0x44074) */ 156 158 }; 157 159 158 160 static const struct xe_reg ver_2000_runtime_regs[] = { ··· 169 173 XE2_GT_GEOMETRY_DSS_2, /* _MMIO(0x9154) */ 170 174 CTC_MODE, /* _MMIO(0xa26c) */ 171 175 HUC_KERNEL_LOAD_INFO, /* _MMIO(0xc1dc) */ 172 - TIMESTAMP_OVERRIDE, /* _MMIO(0x44074) */ 173 176 }; 174 177 175 178 static const struct xe_reg ver_3000_runtime_regs[] = {

+8 -1

drivers/gpu/drm/xe/xe_gt_sriov_vf.c

··· 47 47 return ret > 0 ? -EPROTO : ret; 48 48 } 49 49 50 + #define GUC_RESET_VF_STATE_RETRY_MAX 10 50 51 static int vf_reset_guc_state(struct xe_gt *gt) 51 52 { 53 + unsigned int retry = GUC_RESET_VF_STATE_RETRY_MAX; 52 54 struct xe_guc *guc = &gt->uc.guc; 53 55 int err; 54 56 55 - err = guc_action_vf_reset(guc); 57 + do { 58 + err = guc_action_vf_reset(guc); 59 + if (!err || err != -ETIMEDOUT) 60 + break; 61 + } while (--retry); 62 + 56 63 if (unlikely(err)) 57 64 xe_gt_sriov_err(gt, "Failed to reset GuC state (%pe)\n", ERR_PTR(err)); 58 65 return err;

+4 -4

drivers/gpu/drm/xe/xe_gt_stats.c

··· 23 23 if (id >= __XE_GT_STATS_NUM_IDS) 24 24 return; 25 25 26 - atomic_add(incr, &gt->stats.counters[id]); 26 + atomic64_add(incr, &gt->stats.counters[id]); 27 27 } 28 28 29 29 static const char *const stat_description[__XE_GT_STATS_NUM_IDS] = { 30 30 "tlb_inval_count", 31 31 "vma_pagefault_count", 32 - "vma_pagefault_bytes", 32 + "vma_pagefault_kb", 33 33 }; 34 34 35 35 /** ··· 44 44 enum xe_gt_stats_id id; 45 45 46 46 for (id = 0; id < __XE_GT_STATS_NUM_IDS; ++id) 47 - drm_printf(p, "%s: %d\n", stat_description[id], 48 - atomic_read(&gt->stats.counters[id])); 47 + drm_printf(p, "%s: %lld\n", stat_description[id], 48 + atomic64_read(&gt->stats.counters[id])); 49 49 50 50 return 0; 51 51 }

+1 -1

drivers/gpu/drm/xe/xe_gt_stats_types.h

··· 9 9 enum xe_gt_stats_id { 10 10 XE_GT_STATS_ID_TLB_INVAL, 11 11 XE_GT_STATS_ID_VMA_PAGEFAULT_COUNT, 12 - XE_GT_STATS_ID_VMA_PAGEFAULT_BYTES, 12 + XE_GT_STATS_ID_VMA_PAGEFAULT_KB, 13 13 /* must be the last entry */ 14 14 __XE_GT_STATS_NUM_IDS, 15 15 };

+22

drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c

··· 411 411 } 412 412 413 413 /** 414 + * xe_gt_tlb_invalidation_vm - Issue a TLB invalidation on this GT for a VM 415 + * @gt: graphics tile 416 + * @vm: VM to invalidate 417 + * 418 + * Invalidate entire VM's address space 419 + */ 420 + void xe_gt_tlb_invalidation_vm(struct xe_gt *gt, struct xe_vm *vm) 421 + { 422 + struct xe_gt_tlb_invalidation_fence fence; 423 + u64 range = 1ull << vm->xe->info.va_bits; 424 + int ret; 425 + 426 + xe_gt_tlb_invalidation_fence_init(gt, &fence, true); 427 + 428 + ret = xe_gt_tlb_invalidation_range(gt, &fence, 0, range, vm->usm.asid); 429 + if (ret < 0) 430 + return; 431 + 432 + xe_gt_tlb_invalidation_fence_wait(&fence); 433 + } 434 + 435 + /** 414 436 * xe_gt_tlb_invalidation_vma - Issue a TLB invalidation on this GT for a VMA 415 437 * @gt: GT structure 416 438 * @fence: invalidation fence which will be signal on TLB invalidation

+2

drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h

··· 12 12 13 13 struct xe_gt; 14 14 struct xe_guc; 15 + struct xe_vm; 15 16 struct xe_vma; 16 17 17 18 int xe_gt_tlb_invalidation_init_early(struct xe_gt *gt); ··· 22 21 int xe_gt_tlb_invalidation_vma(struct xe_gt *gt, 23 22 struct xe_gt_tlb_invalidation_fence *fence, 24 23 struct xe_vma *vma); 24 + void xe_gt_tlb_invalidation_vm(struct xe_gt *gt, struct xe_vm *vm); 25 25 int xe_gt_tlb_invalidation_range(struct xe_gt *gt, 26 26 struct xe_gt_tlb_invalidation_fence *fence, 27 27 u64 start, u64 end, u32 asid);

+13

drivers/gpu/drm/xe/xe_gt_topology.h

··· 25 25 26 26 void xe_gt_topology_dump(struct xe_gt *gt, struct drm_printer *p); 27 27 28 + /** 29 + * xe_gt_topology_mask_last_dss() - Returns the index of the last DSS in a mask. 30 + * @mask: Input DSS mask 31 + * 32 + * Return: Index of the last DSS in the input DSS mask, 33 + * XE_MAX_DSS_FUSE_BITS if DSS mask is empty. 34 + */ 35 + static inline unsigned int 36 + xe_gt_topology_mask_last_dss(const xe_dss_mask_t mask) 37 + { 38 + return find_last_bit(mask, XE_MAX_DSS_FUSE_BITS); 39 + } 40 + 28 41 unsigned int 29 42 xe_dss_mask_group_ffs(const xe_dss_mask_t mask, int groupsize, int groupnum); 30 43

+14 -1

drivers/gpu/drm/xe/xe_gt_types.h

··· 139 139 /** @stats: GT stats */ 140 140 struct { 141 141 /** @stats.counters: counters for various GT stats */ 142 - atomic_t counters[__XE_GT_STATS_NUM_IDS]; 142 + atomic64_t counters[__XE_GT_STATS_NUM_IDS]; 143 143 } stats; 144 144 #endif 145 145 ··· 413 413 bool oob_initialized; 414 414 } wa_active; 415 415 416 + /** @tuning_active: keep track of active tunings */ 417 + struct { 418 + /** @tuning_active.gt: bitmap with active GT tunings */ 419 + unsigned long *gt; 420 + /** @tuning_active.engine: bitmap with active engine tunings */ 421 + unsigned long *engine; 422 + /** @tuning_active.lrc: bitmap with active LRC tunings */ 423 + unsigned long *lrc; 424 + } tuning_active; 425 + 416 426 /** @user_engines: engines present in GT and available to userspace */ 417 427 struct { 418 428 /** ··· 440 430 441 431 /** @oa: oa observation subsystem per gt info */ 442 432 struct xe_oa_gt oa; 433 + 434 + /** @eu_stall: EU stall counters subsystem per gt info */ 435 + struct xe_eu_stall_gt *eu_stall; 443 436 }; 444 437 445 438 #endif

+5

drivers/gpu/drm/xe/xe_guc.c

··· 27 27 #include "xe_guc_capture.h" 28 28 #include "xe_guc_ct.h" 29 29 #include "xe_guc_db_mgr.h" 30 + #include "xe_guc_engine_activity.h" 30 31 #include "xe_guc_hwconfig.h" 31 32 #include "xe_guc_log.h" 32 33 #include "xe_guc_pc.h" ··· 742 741 return ret; 743 742 744 743 ret = xe_guc_pc_init(&guc->pc); 744 + if (ret) 745 + return ret; 746 + 747 + ret = xe_guc_engine_activity_init(guc); 745 748 if (ret) 746 749 return ret; 747 750

+1 -1

drivers/gpu/drm/xe/xe_guc_ads.c

··· 342 342 offset = guc_ads_waklv_offset(ads); 343 343 remain = guc_ads_waklv_size(ads); 344 344 345 - if (XE_WA(gt, 14019882105)) 345 + if (XE_WA(gt, 14019882105) || XE_WA(gt, 16021333562)) 346 346 guc_waklv_enable_simple(ads, 347 347 GUC_WORKAROUND_KLV_BLOCK_INTERRUPTS_WHEN_MGSR_BLOCKED, 348 348 &offset, &remain);

+373

drivers/gpu/drm/xe/xe_guc_engine_activity.c

··· 1 + // SPDX-License-Identifier: MIT 2 + /* 3 + * Copyright © 2025 Intel Corporation 4 + */ 5 + 6 + #include <drm/drm_managed.h> 7 + 8 + #include "abi/guc_actions_abi.h" 9 + #include "regs/xe_gt_regs.h" 10 + 11 + #include "xe_bo.h" 12 + #include "xe_force_wake.h" 13 + #include "xe_gt_printk.h" 14 + #include "xe_guc.h" 15 + #include "xe_guc_engine_activity.h" 16 + #include "xe_guc_ct.h" 17 + #include "xe_hw_engine.h" 18 + #include "xe_map.h" 19 + #include "xe_mmio.h" 20 + #include "xe_trace_guc.h" 21 + 22 + #define TOTAL_QUANTA 0x8000 23 + 24 + static struct iosys_map engine_activity_map(struct xe_guc *guc, struct xe_hw_engine *hwe) 25 + { 26 + struct xe_guc_engine_activity *engine_activity = &guc->engine_activity; 27 + struct engine_activity_buffer *buffer = &engine_activity->device_buffer; 28 + u16 guc_class = xe_engine_class_to_guc_class(hwe->class); 29 + size_t offset; 30 + 31 + offset = offsetof(struct guc_engine_activity_data, 32 + engine_activity[guc_class][hwe->logical_instance]); 33 + 34 + return IOSYS_MAP_INIT_OFFSET(&buffer->activity_bo->vmap, offset); 35 + } 36 + 37 + static struct iosys_map engine_metadata_map(struct xe_guc *guc) 38 + { 39 + struct xe_guc_engine_activity *engine_activity = &guc->engine_activity; 40 + struct engine_activity_buffer *buffer = &engine_activity->device_buffer; 41 + 42 + return buffer->metadata_bo->vmap; 43 + } 44 + 45 + static int allocate_engine_activity_group(struct xe_guc *guc) 46 + { 47 + struct xe_guc_engine_activity *engine_activity = &guc->engine_activity; 48 + struct xe_device *xe = guc_to_xe(guc); 49 + u32 num_activity_group = 1; /* Will be modified for VF */ 50 + 51 + engine_activity->eag = drmm_kcalloc(&xe->drm, num_activity_group, 52 + sizeof(struct engine_activity_group), GFP_KERNEL); 53 + 54 + if (!engine_activity->eag) 55 + return -ENOMEM; 56 + 57 + engine_activity->num_activity_group = num_activity_group; 58 + 59 + return 0; 60 + } 61 + 62 + static int allocate_engine_activity_buffers(struct xe_guc *guc, 63 + struct engine_activity_buffer *buffer) 64 + { 65 + u32 metadata_size = sizeof(struct guc_engine_activity_metadata); 66 + u32 size = sizeof(struct guc_engine_activity_data); 67 + struct xe_gt *gt = guc_to_gt(guc); 68 + struct xe_tile *tile = gt_to_tile(gt); 69 + struct xe_bo *bo, *metadata_bo; 70 + 71 + metadata_bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL, PAGE_ALIGN(metadata_size), 72 + ttm_bo_type_kernel, XE_BO_FLAG_SYSTEM | 73 + XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE); 74 + 75 + if (IS_ERR(metadata_bo)) 76 + return PTR_ERR(metadata_bo); 77 + 78 + bo = xe_bo_create_pin_map(gt_to_xe(gt), tile, NULL, PAGE_ALIGN(size), 79 + ttm_bo_type_kernel, XE_BO_FLAG_VRAM_IF_DGFX(tile) | 80 + XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE); 81 + 82 + if (IS_ERR(bo)) { 83 + xe_bo_unpin_map_no_vm(metadata_bo); 84 + return PTR_ERR(bo); 85 + } 86 + 87 + buffer->metadata_bo = metadata_bo; 88 + buffer->activity_bo = bo; 89 + return 0; 90 + } 91 + 92 + static void free_engine_activity_buffers(struct engine_activity_buffer *buffer) 93 + { 94 + xe_bo_unpin_map_no_vm(buffer->metadata_bo); 95 + xe_bo_unpin_map_no_vm(buffer->activity_bo); 96 + } 97 + 98 + static bool is_engine_activity_supported(struct xe_guc *guc) 99 + { 100 + struct xe_uc_fw_version *version = &guc->fw.versions.found[XE_UC_FW_VER_COMPATIBILITY]; 101 + struct xe_uc_fw_version required = { 1, 14, 1 }; 102 + struct xe_gt *gt = guc_to_gt(guc); 103 + 104 + if (IS_SRIOV_VF(gt_to_xe(gt))) { 105 + xe_gt_info(gt, "engine activity stats not supported on VFs\n"); 106 + return false; 107 + } 108 + 109 + /* engine activity stats is supported from GuC interface version (1.14.1) */ 110 + if (GUC_SUBMIT_VER(guc) < MAKE_GUC_VER_STRUCT(required)) { 111 + xe_gt_info(gt, 112 + "engine activity stats unsupported in GuC interface v%u.%u.%u, need v%u.%u.%u or higher\n", 113 + version->major, version->minor, version->patch, required.major, 114 + required.minor, required.patch); 115 + return false; 116 + } 117 + 118 + return true; 119 + } 120 + 121 + static struct engine_activity *hw_engine_to_engine_activity(struct xe_hw_engine *hwe) 122 + { 123 + struct xe_guc *guc = &hwe->gt->uc.guc; 124 + struct engine_activity_group *eag = &guc->engine_activity.eag[0]; 125 + u16 guc_class = xe_engine_class_to_guc_class(hwe->class); 126 + 127 + return &eag->engine[guc_class][hwe->logical_instance]; 128 + } 129 + 130 + static u64 cpu_ns_to_guc_tsc_tick(ktime_t ns, u32 freq) 131 + { 132 + return mul_u64_u32_div(ns, freq, NSEC_PER_SEC); 133 + } 134 + 135 + #define read_engine_activity_record(xe_, map_, field_) \ 136 + xe_map_rd_field(xe_, map_, 0, struct guc_engine_activity, field_) 137 + 138 + #define read_metadata_record(xe_, map_, field_) \ 139 + xe_map_rd_field(xe_, map_, 0, struct guc_engine_activity_metadata, field_) 140 + 141 + static u64 get_engine_active_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe) 142 + { 143 + struct engine_activity *ea = hw_engine_to_engine_activity(hwe); 144 + struct guc_engine_activity *cached_activity = &ea->activity; 145 + struct guc_engine_activity_metadata *cached_metadata = &ea->metadata; 146 + struct xe_guc_engine_activity *engine_activity = &guc->engine_activity; 147 + struct iosys_map activity_map, metadata_map; 148 + struct xe_device *xe = guc_to_xe(guc); 149 + struct xe_gt *gt = guc_to_gt(guc); 150 + u32 last_update_tick, global_change_num; 151 + u64 active_ticks, gpm_ts; 152 + u16 change_num; 153 + 154 + activity_map = engine_activity_map(guc, hwe); 155 + metadata_map = engine_metadata_map(guc); 156 + global_change_num = read_metadata_record(xe, &metadata_map, global_change_num); 157 + 158 + /* GuC has not initialized activity data yet, return 0 */ 159 + if (!global_change_num) 160 + goto update; 161 + 162 + if (global_change_num == cached_metadata->global_change_num) 163 + goto update; 164 + 165 + cached_metadata->global_change_num = global_change_num; 166 + change_num = read_engine_activity_record(xe, &activity_map, change_num); 167 + 168 + if (!change_num || change_num == cached_activity->change_num) 169 + goto update; 170 + 171 + /* read engine activity values */ 172 + last_update_tick = read_engine_activity_record(xe, &activity_map, last_update_tick); 173 + active_ticks = read_engine_activity_record(xe, &activity_map, active_ticks); 174 + 175 + /* activity calculations */ 176 + ea->running = !!last_update_tick; 177 + ea->total += active_ticks - cached_activity->active_ticks; 178 + ea->active = 0; 179 + 180 + /* cache the counter */ 181 + cached_activity->change_num = change_num; 182 + cached_activity->last_update_tick = last_update_tick; 183 + cached_activity->active_ticks = active_ticks; 184 + 185 + update: 186 + if (ea->running) { 187 + gpm_ts = xe_mmio_read64_2x32(&gt->mmio, MISC_STATUS_0) >> 188 + engine_activity->gpm_timestamp_shift; 189 + ea->active = lower_32_bits(gpm_ts) - cached_activity->last_update_tick; 190 + } 191 + 192 + trace_xe_guc_engine_activity(xe, ea, hwe->name, hwe->instance); 193 + 194 + return ea->total + ea->active; 195 + } 196 + 197 + static u64 get_engine_total_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe) 198 + { 199 + struct engine_activity *ea = hw_engine_to_engine_activity(hwe); 200 + struct guc_engine_activity_metadata *cached_metadata = &ea->metadata; 201 + struct guc_engine_activity *cached_activity = &ea->activity; 202 + struct iosys_map activity_map, metadata_map; 203 + struct xe_device *xe = guc_to_xe(guc); 204 + ktime_t now, cpu_delta; 205 + u64 numerator; 206 + u16 quanta_ratio; 207 + 208 + activity_map = engine_activity_map(guc, hwe); 209 + metadata_map = engine_metadata_map(guc); 210 + 211 + if (!cached_metadata->guc_tsc_frequency_hz) 212 + cached_metadata->guc_tsc_frequency_hz = read_metadata_record(xe, &metadata_map, 213 + guc_tsc_frequency_hz); 214 + 215 + quanta_ratio = read_engine_activity_record(xe, &activity_map, quanta_ratio); 216 + cached_activity->quanta_ratio = quanta_ratio; 217 + 218 + /* Total ticks calculations */ 219 + now = ktime_get(); 220 + cpu_delta = now - ea->last_cpu_ts; 221 + ea->last_cpu_ts = now; 222 + numerator = (ea->quanta_remainder_ns + cpu_delta) * cached_activity->quanta_ratio; 223 + ea->quanta_ns += numerator / TOTAL_QUANTA; 224 + ea->quanta_remainder_ns = numerator % TOTAL_QUANTA; 225 + ea->quanta = cpu_ns_to_guc_tsc_tick(ea->quanta_ns, cached_metadata->guc_tsc_frequency_hz); 226 + 227 + trace_xe_guc_engine_activity(xe, ea, hwe->name, hwe->instance); 228 + 229 + return ea->quanta; 230 + } 231 + 232 + static int enable_engine_activity_stats(struct xe_guc *guc) 233 + { 234 + struct xe_guc_engine_activity *engine_activity = &guc->engine_activity; 235 + struct engine_activity_buffer *buffer = &engine_activity->device_buffer; 236 + u32 action[] = { 237 + XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER, 238 + xe_bo_ggtt_addr(buffer->metadata_bo), 239 + 0, 240 + xe_bo_ggtt_addr(buffer->activity_bo), 241 + 0, 242 + }; 243 + 244 + /* Blocking here to ensure the buffers are ready before reading them */ 245 + return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action)); 246 + } 247 + 248 + static void engine_activity_set_cpu_ts(struct xe_guc *guc) 249 + { 250 + struct xe_guc_engine_activity *engine_activity = &guc->engine_activity; 251 + struct engine_activity_group *eag = &engine_activity->eag[0]; 252 + int i, j; 253 + 254 + for (i = 0; i < GUC_MAX_ENGINE_CLASSES; i++) 255 + for (j = 0; j < GUC_MAX_INSTANCES_PER_CLASS; j++) 256 + eag->engine[i][j].last_cpu_ts = ktime_get(); 257 + } 258 + 259 + static u32 gpm_timestamp_shift(struct xe_gt *gt) 260 + { 261 + u32 reg; 262 + 263 + reg = xe_mmio_read32(&gt->mmio, RPM_CONFIG0); 264 + 265 + return 3 - REG_FIELD_GET(RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK, reg); 266 + } 267 + 268 + /** 269 + * xe_guc_engine_activity_active_ticks - Get engine active ticks 270 + * @guc: The GuC object 271 + * @hwe: The hw_engine object 272 + * 273 + * Return: accumulated ticks @hwe was active since engine activity stats were enabled. 274 + */ 275 + u64 xe_guc_engine_activity_active_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe) 276 + { 277 + if (!xe_guc_engine_activity_supported(guc)) 278 + return 0; 279 + 280 + return get_engine_active_ticks(guc, hwe); 281 + } 282 + 283 + /** 284 + * xe_guc_engine_activity_total_ticks - Get engine total ticks 285 + * @guc: The GuC object 286 + * @hwe: The hw_engine object 287 + * 288 + * Return: accumulated quanta of ticks allocated for the engine 289 + */ 290 + u64 xe_guc_engine_activity_total_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe) 291 + { 292 + if (!xe_guc_engine_activity_supported(guc)) 293 + return 0; 294 + 295 + return get_engine_total_ticks(guc, hwe); 296 + } 297 + 298 + /** 299 + * xe_guc_engine_activity_supported - Check support for engine activity stats 300 + * @guc: The GuC object 301 + * 302 + * Engine activity stats is supported from GuC interface version (1.14.1) 303 + * 304 + * Return: true if engine activity stats supported, false otherwise 305 + */ 306 + bool xe_guc_engine_activity_supported(struct xe_guc *guc) 307 + { 308 + struct xe_guc_engine_activity *engine_activity = &guc->engine_activity; 309 + 310 + return engine_activity->supported; 311 + } 312 + 313 + /** 314 + * xe_guc_engine_activity_enable_stats - Enable engine activity stats 315 + * @guc: The GuC object 316 + * 317 + * Enable engine activity stats and set initial timestamps 318 + */ 319 + void xe_guc_engine_activity_enable_stats(struct xe_guc *guc) 320 + { 321 + int ret; 322 + 323 + if (!xe_guc_engine_activity_supported(guc)) 324 + return; 325 + 326 + ret = enable_engine_activity_stats(guc); 327 + if (ret) 328 + xe_gt_err(guc_to_gt(guc), "failed to enable activity stats%d\n", ret); 329 + else 330 + engine_activity_set_cpu_ts(guc); 331 + } 332 + 333 + static void engine_activity_fini(void *arg) 334 + { 335 + struct xe_guc_engine_activity *engine_activity = arg; 336 + struct engine_activity_buffer *buffer = &engine_activity->device_buffer; 337 + 338 + free_engine_activity_buffers(buffer); 339 + } 340 + 341 + /** 342 + * xe_guc_engine_activity_init - Initialize the engine activity data 343 + * @guc: The GuC object 344 + * 345 + * Return: 0 on success, negative error code otherwise. 346 + */ 347 + int xe_guc_engine_activity_init(struct xe_guc *guc) 348 + { 349 + struct xe_guc_engine_activity *engine_activity = &guc->engine_activity; 350 + struct xe_gt *gt = guc_to_gt(guc); 351 + int ret; 352 + 353 + engine_activity->supported = is_engine_activity_supported(guc); 354 + if (!engine_activity->supported) 355 + return 0; 356 + 357 + ret = allocate_engine_activity_group(guc); 358 + if (ret) { 359 + xe_gt_err(gt, "failed to allocate engine activity group (%pe)\n", ERR_PTR(ret)); 360 + return ret; 361 + } 362 + 363 + ret = allocate_engine_activity_buffers(guc, &engine_activity->device_buffer); 364 + if (ret) { 365 + xe_gt_err(gt, "failed to allocate engine activity buffers (%pe)\n", ERR_PTR(ret)); 366 + return ret; 367 + } 368 + 369 + engine_activity->gpm_timestamp_shift = gpm_timestamp_shift(gt); 370 + 371 + return devm_add_action_or_reset(gt_to_xe(gt)->drm.dev, engine_activity_fini, 372 + engine_activity); 373 + }

+19

drivers/gpu/drm/xe/xe_guc_engine_activity.h

··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2025 Intel Corporation 4 + */ 5 + 6 + #ifndef _XE_GUC_ENGINE_ACTIVITY_H_ 7 + #define _XE_GUC_ENGINE_ACTIVITY_H_ 8 + 9 + #include <linux/types.h> 10 + 11 + struct xe_hw_engine; 12 + struct xe_guc; 13 + 14 + int xe_guc_engine_activity_init(struct xe_guc *guc); 15 + bool xe_guc_engine_activity_supported(struct xe_guc *guc); 16 + void xe_guc_engine_activity_enable_stats(struct xe_guc *guc); 17 + u64 xe_guc_engine_activity_active_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe); 18 + u64 xe_guc_engine_activity_total_ticks(struct xe_guc *guc, struct xe_hw_engine *hwe); 19 + #endif

+92

drivers/gpu/drm/xe/xe_guc_engine_activity_types.h

··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2025 Intel Corporation 4 + */ 5 + 6 + #ifndef _XE_GUC_ENGINE_ACTIVITY_TYPES_H_ 7 + #define _XE_GUC_ENGINE_ACTIVITY_TYPES_H_ 8 + 9 + #include <linux/types.h> 10 + 11 + #include "xe_guc_fwif.h" 12 + /** 13 + * struct engine_activity - Engine specific activity data 14 + * 15 + * Contains engine specific activity data and snapshot of the 16 + * structures from GuC 17 + */ 18 + struct engine_activity { 19 + /** @active: current activity */ 20 + u64 active; 21 + 22 + /** @last_cpu_ts: cpu timestamp in nsec of previous sample */ 23 + u64 last_cpu_ts; 24 + 25 + /** @quanta: total quanta used on HW */ 26 + u64 quanta; 27 + 28 + /** @quanta_ns: total quanta_ns used on HW */ 29 + u64 quanta_ns; 30 + 31 + /** 32 + * @quanta_remainder_ns: remainder when the CPU time is scaled as 33 + * per the quanta_ratio. This remainder is used in subsequent 34 + * quanta calculations. 35 + */ 36 + u64 quanta_remainder_ns; 37 + 38 + /** @total: total engine activity */ 39 + u64 total; 40 + 41 + /** @running: true if engine is running some work */ 42 + bool running; 43 + 44 + /** @metadata: snapshot of engine activity metadata */ 45 + struct guc_engine_activity_metadata metadata; 46 + 47 + /** @activity: snapshot of engine activity counter */ 48 + struct guc_engine_activity activity; 49 + }; 50 + 51 + /** 52 + * struct engine_activity_group - Activity data for all engines 53 + */ 54 + struct engine_activity_group { 55 + /** @engine: engine specific activity data */ 56 + struct engine_activity engine[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS]; 57 + }; 58 + 59 + /** 60 + * struct engine_activity_buffer - engine activity buffers 61 + * 62 + * This contains the buffers allocated for metadata and activity data 63 + */ 64 + struct engine_activity_buffer { 65 + /** @activity_bo: object allocated to hold activity data */ 66 + struct xe_bo *activity_bo; 67 + 68 + /** @metadata_bo: object allocated to hold activity metadata */ 69 + struct xe_bo *metadata_bo; 70 + }; 71 + 72 + /** 73 + * struct xe_guc_engine_activity - Data used by engine activity implementation 74 + */ 75 + struct xe_guc_engine_activity { 76 + /** @gpm_timestamp_shift: Right shift value for the gpm timestamp */ 77 + u32 gpm_timestamp_shift; 78 + 79 + /** @num_activity_group: number of activity groups */ 80 + u32 num_activity_group; 81 + 82 + /** @supported: indicates support for engine activity stats */ 83 + bool supported; 84 + 85 + /** @eag: holds the device level engine activity data */ 86 + struct engine_activity_group *eag; 87 + 88 + /** @device_buffer: buffer object for global engine activity */ 89 + struct engine_activity_buffer device_buffer; 90 + }; 91 + #endif 92 +

+19

drivers/gpu/drm/xe/xe_guc_fwif.h

··· 208 208 struct guc_engine_usage_record engines[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS]; 209 209 } __packed; 210 210 211 + /* Engine Activity stats */ 212 + struct guc_engine_activity { 213 + u16 change_num; 214 + u16 quanta_ratio; 215 + u32 last_update_tick; 216 + u64 active_ticks; 217 + } __packed; 218 + 219 + struct guc_engine_activity_data { 220 + struct guc_engine_activity engine_activity[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS]; 221 + } __packed; 222 + 223 + struct guc_engine_activity_metadata { 224 + u32 guc_tsc_frequency_hz; 225 + u32 lag_latency_usec; 226 + u32 global_change_num; 227 + u32 reserved; 228 + } __packed; 229 + 211 230 /* This action will be programmed in C1BC - SOFT_SCRATCH_15_REG */ 212 231 enum xe_guc_recv_message { 213 232 XE_GUC_RECV_MSG_CRASH_DUMP_POSTED = BIT(1),

+16

drivers/gpu/drm/xe/xe_guc_pc.c

··· 995 995 return ret; 996 996 } 997 997 998 + static int pc_action_set_strategy(struct xe_guc_pc *pc, u32 val) 999 + { 1000 + int ret = 0; 1001 + 1002 + ret = pc_action_set_param(pc, 1003 + SLPC_PARAM_STRATEGIES, 1004 + val); 1005 + 1006 + return ret; 1007 + } 1008 + 998 1009 /** 999 1010 * xe_guc_pc_start - Start GuC's Power Conservation component 1000 1011 * @pc: Xe_GuC_PC instance ··· 1065 1054 } 1066 1055 1067 1056 ret = pc_action_setup_gucrc(pc, GUCRC_FIRMWARE_CONTROL); 1057 + if (ret) 1058 + goto out; 1059 + 1060 + /* Enable SLPC Optimized Strategy for compute */ 1061 + ret = pc_action_set_strategy(pc, SLPC_OPTIMIZED_STRATEGY_COMPUTE); 1068 1062 1069 1063 out: 1070 1064 xe_force_wake_put(gt_to_fw(gt), fw_ref);

+10

drivers/gpu/drm/xe/xe_guc_submit.c

··· 15 15 #include <drm/drm_managed.h> 16 16 17 17 #include "abi/guc_actions_abi.h" 18 + #include "abi/guc_actions_slpc_abi.h" 18 19 #include "abi/guc_klvs_abi.h" 19 20 #include "regs/xe_lrc_layout.h" 20 21 #include "xe_assert.h" ··· 401 400 MAKE_EXEC_QUEUE_POLICY_ADD(execution_quantum, EXECUTION_QUANTUM) 402 401 MAKE_EXEC_QUEUE_POLICY_ADD(preemption_timeout, PREEMPTION_TIMEOUT) 403 402 MAKE_EXEC_QUEUE_POLICY_ADD(priority, SCHEDULING_PRIORITY) 403 + MAKE_EXEC_QUEUE_POLICY_ADD(slpc_exec_queue_freq_req, SLPM_GT_FREQUENCY) 404 404 #undef MAKE_EXEC_QUEUE_POLICY_ADD 405 405 406 406 static const int xe_exec_queue_prio_to_guc[] = { ··· 416 414 struct exec_queue_policy policy; 417 415 enum xe_exec_queue_priority prio = q->sched_props.priority; 418 416 u32 timeslice_us = q->sched_props.timeslice_us; 417 + u32 slpc_exec_queue_freq_req = 0; 419 418 u32 preempt_timeout_us = q->sched_props.preempt_timeout_us; 420 419 421 420 xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q)); 421 + 422 + if (q->flags & EXEC_QUEUE_FLAG_LOW_LATENCY) 423 + slpc_exec_queue_freq_req |= SLPC_CTX_FREQ_REQ_IS_COMPUTE; 422 424 423 425 __guc_exec_queue_policy_start_klv(&policy, q->guc->id); 424 426 __guc_exec_queue_policy_add_priority(&policy, xe_exec_queue_prio_to_guc[prio]); 425 427 __guc_exec_queue_policy_add_execution_quantum(&policy, timeslice_us); 426 428 __guc_exec_queue_policy_add_preemption_timeout(&policy, preempt_timeout_us); 429 + __guc_exec_queue_policy_add_slpc_exec_queue_freq_req(&policy, 430 + slpc_exec_queue_freq_req); 427 431 428 432 xe_guc_ct_send(&guc->ct, (u32 *)&policy.h2g, 429 433 __guc_exec_queue_policy_action_size(&policy), 0, 0); ··· 1256 1248 1257 1249 if (xe_exec_queue_is_lr(q)) 1258 1250 cancel_work_sync(&ge->lr_tdr); 1251 + /* Confirm no work left behind accessing device structures */ 1252 + cancel_delayed_work_sync(&ge->sched.base.work_tdr); 1259 1253 release_guc_id(guc, q); 1260 1254 xe_sched_entity_fini(&ge->entity); 1261 1255 xe_sched_fini(&ge->sched);

+4

drivers/gpu/drm/xe/xe_guc_types.h

··· 13 13 #include "xe_guc_ads_types.h" 14 14 #include "xe_guc_buf_types.h" 15 15 #include "xe_guc_ct_types.h" 16 + #include "xe_guc_engine_activity_types.h" 16 17 #include "xe_guc_fwif.h" 17 18 #include "xe_guc_log_types.h" 18 19 #include "xe_guc_pc_types.h" ··· 103 102 104 103 /** @relay: GuC Relay Communication used in SR-IOV */ 105 104 struct xe_guc_relay relay; 105 + 106 + /** @engine_activity: Device specific engine activity */ 107 + struct xe_guc_engine_activity engine_activity; 106 108 107 109 /** 108 110 * @notify_reg: Register which is written to notify GuC of H2G messages

+16 -23

drivers/gpu/drm/xe/xe_heci_gsc.c

··· 89 89 kfree(adev); 90 90 } 91 91 92 - void xe_heci_gsc_fini(struct xe_device *xe) 92 + static void xe_heci_gsc_fini(void *arg) 93 93 { 94 - struct xe_heci_gsc *heci_gsc = &xe->heci_gsc; 95 - 96 - if (!xe->info.has_heci_gscfi && !xe->info.has_heci_cscfi) 97 - return; 94 + struct xe_heci_gsc *heci_gsc = arg; 98 95 99 96 if (heci_gsc->adev) { 100 97 struct auxiliary_device *aux_dev = &heci_gsc->adev->aux_dev; ··· 103 106 104 107 if (heci_gsc->irq >= 0) 105 108 irq_free_desc(heci_gsc->irq); 109 + 106 110 heci_gsc->irq = -1; 107 111 } 108 112 ··· 170 172 return ret; 171 173 } 172 174 173 - void xe_heci_gsc_init(struct xe_device *xe) 175 + int xe_heci_gsc_init(struct xe_device *xe) 174 176 { 175 177 struct xe_heci_gsc *heci_gsc = &xe->heci_gsc; 176 - const struct heci_gsc_def *def; 178 + const struct heci_gsc_def *def = NULL; 177 179 int ret; 178 180 179 181 if (!xe->info.has_heci_gscfi && !xe->info.has_heci_cscfi) 180 - return; 182 + return 0; 181 183 182 184 heci_gsc->irq = -1; 183 185 ··· 189 191 def = &heci_gsc_def_dg2; 190 192 } else if (xe->info.platform == XE_DG1) { 191 193 def = &heci_gsc_def_dg1; 192 - } else { 193 - drm_warn_once(&xe->drm, "Unknown platform\n"); 194 - return; 195 194 } 196 195 197 - if (!def->name) { 198 - drm_warn_once(&xe->drm, "HECI is not implemented!\n"); 199 - return; 196 + if (!def || !def->name) { 197 + drm_warn(&xe->drm, "HECI is not implemented!\n"); 198 + return 0; 200 199 } 201 200 202 - if (!def->use_polling && !xe_survivability_mode_enabled(xe)) { 201 + ret = devm_add_action_or_reset(xe->drm.dev, xe_heci_gsc_fini, heci_gsc); 202 + if (ret) 203 + return ret; 204 + 205 + if (!def->use_polling && !xe_survivability_mode_is_enabled(xe)) { 203 206 ret = heci_gsc_irq_setup(xe); 204 207 if (ret) 205 - goto fail; 208 + return ret; 206 209 } 207 210 208 - ret = heci_gsc_add_device(xe, def); 209 - if (ret) 210 - goto fail; 211 - 212 - return; 213 - fail: 214 - xe_heci_gsc_fini(xe); 211 + return heci_gsc_add_device(xe, def); 215 212 } 216 213 217 214 void xe_heci_gsc_irq_handler(struct xe_device *xe, u32 iir)

+1 -2

drivers/gpu/drm/xe/xe_heci_gsc.h

··· 33 33 int irq; 34 34 }; 35 35 36 - void xe_heci_gsc_init(struct xe_device *xe); 37 - void xe_heci_gsc_fini(struct xe_device *xe); 36 + int xe_heci_gsc_init(struct xe_device *xe); 38 37 void xe_heci_gsc_irq_handler(struct xe_device *xe, u32 iir); 39 38 void xe_heci_csc_irq_handler(struct xe_device *xe, u32 iir); 40 39

+143 -51

drivers/gpu/drm/xe/xe_hmm.c

··· 19 19 return (end - start) >> PAGE_SHIFT; 20 20 } 21 21 22 - /* 22 + /** 23 23 * xe_mark_range_accessed() - mark a range is accessed, so core mm 24 24 * have such information for memory eviction or write back to 25 25 * hard disk 26 - * 27 26 * @range: the range to mark 28 27 * @write: if write to this range, we mark pages in this range 29 28 * as dirty ··· 42 43 } 43 44 } 44 45 45 - /* 46 + static int xe_alloc_sg(struct xe_device *xe, struct sg_table *st, 47 + struct hmm_range *range, struct rw_semaphore *notifier_sem) 48 + { 49 + unsigned long i, npages, hmm_pfn; 50 + unsigned long num_chunks = 0; 51 + int ret; 52 + 53 + /* HMM docs says this is needed. */ 54 + ret = down_read_interruptible(notifier_sem); 55 + if (ret) 56 + return ret; 57 + 58 + if (mmu_interval_read_retry(range->notifier, range->notifier_seq)) { 59 + up_read(notifier_sem); 60 + return -EAGAIN; 61 + } 62 + 63 + npages = xe_npages_in_range(range->start, range->end); 64 + for (i = 0; i < npages;) { 65 + unsigned long len; 66 + 67 + hmm_pfn = range->hmm_pfns[i]; 68 + xe_assert(xe, hmm_pfn & HMM_PFN_VALID); 69 + 70 + len = 1UL << hmm_pfn_to_map_order(hmm_pfn); 71 + 72 + /* If order > 0 the page may extend beyond range->start */ 73 + len -= (hmm_pfn & ~HMM_PFN_FLAGS) & (len - 1); 74 + i += len; 75 + num_chunks++; 76 + } 77 + up_read(notifier_sem); 78 + 79 + return sg_alloc_table(st, num_chunks, GFP_KERNEL); 80 + } 81 + 82 + /** 46 83 * xe_build_sg() - build a scatter gather table for all the physical pages/pfn 47 84 * in a hmm_range. dma-map pages if necessary. dma-address is save in sg table 48 85 * and will be used to program GPU page table later. 49 - * 50 86 * @xe: the xe device who will access the dma-address in sg table 51 87 * @range: the hmm range that we build the sg table from. range->hmm_pfns[] 52 88 * has the pfn numbers of pages that back up this hmm address range. 53 89 * @st: pointer to the sg table. 90 + * @notifier_sem: The xe notifier lock. 54 91 * @write: whether we write to this range. This decides dma map direction 55 92 * for system pages. If write we map it bi-diretional; otherwise 56 93 * DMA_TO_DEVICE ··· 113 78 * Returns 0 if successful; -ENOMEM if fails to allocate memory 114 79 */ 115 80 static int xe_build_sg(struct xe_device *xe, struct hmm_range *range, 116 - struct sg_table *st, bool write) 81 + struct sg_table *st, 82 + struct rw_semaphore *notifier_sem, 83 + bool write) 117 84 { 85 + unsigned long npages = xe_npages_in_range(range->start, range->end); 118 86 struct device *dev = xe->drm.dev; 119 - struct page **pages; 120 - u64 i, npages; 121 - int ret; 87 + struct scatterlist *sgl; 88 + struct page *page; 89 + unsigned long i, j; 122 90 123 - npages = xe_npages_in_range(range->start, range->end); 124 - pages = kvmalloc_array(npages, sizeof(*pages), GFP_KERNEL); 125 - if (!pages) 126 - return -ENOMEM; 91 + lockdep_assert_held(notifier_sem); 127 92 128 - for (i = 0; i < npages; i++) { 129 - pages[i] = hmm_pfn_to_page(range->hmm_pfns[i]); 130 - xe_assert(xe, !is_device_private_page(pages[i])); 93 + i = 0; 94 + for_each_sg(st->sgl, sgl, st->nents, j) { 95 + unsigned long hmm_pfn, size; 96 + 97 + hmm_pfn = range->hmm_pfns[i]; 98 + page = hmm_pfn_to_page(hmm_pfn); 99 + xe_assert(xe, !is_device_private_page(page)); 100 + 101 + size = 1UL << hmm_pfn_to_map_order(hmm_pfn); 102 + size -= page_to_pfn(page) & (size - 1); 103 + i += size; 104 + 105 + if (unlikely(j == st->nents - 1)) { 106 + if (i > npages) 107 + size -= (i - npages); 108 + sg_mark_end(sgl); 109 + } 110 + sg_set_page(sgl, page, size << PAGE_SHIFT, 0); 131 111 } 112 + xe_assert(xe, i == npages); 132 113 133 - ret = sg_alloc_table_from_pages_segment(st, pages, npages, 0, npages << PAGE_SHIFT, 134 - xe_sg_segment_size(dev), GFP_KERNEL); 135 - if (ret) 136 - goto free_pages; 137 - 138 - ret = dma_map_sgtable(dev, st, write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE, 139 - DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING); 140 - if (ret) { 141 - sg_free_table(st); 142 - st = NULL; 143 - } 144 - 145 - free_pages: 146 - kvfree(pages); 147 - return ret; 114 + return dma_map_sgtable(dev, st, write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE, 115 + DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_NO_KERNEL_MAPPING); 148 116 } 149 117 150 - /* 118 + static void xe_hmm_userptr_set_mapped(struct xe_userptr_vma *uvma) 119 + { 120 + struct xe_userptr *userptr = &uvma->userptr; 121 + struct xe_vm *vm = xe_vma_vm(&uvma->vma); 122 + 123 + lockdep_assert_held_write(&vm->lock); 124 + lockdep_assert_held(&vm->userptr.notifier_lock); 125 + 126 + mutex_lock(&userptr->unmap_mutex); 127 + xe_assert(vm->xe, !userptr->mapped); 128 + userptr->mapped = true; 129 + mutex_unlock(&userptr->unmap_mutex); 130 + } 131 + 132 + void xe_hmm_userptr_unmap(struct xe_userptr_vma *uvma) 133 + { 134 + struct xe_userptr *userptr = &uvma->userptr; 135 + struct xe_vma *vma = &uvma->vma; 136 + bool write = !xe_vma_read_only(vma); 137 + struct xe_vm *vm = xe_vma_vm(vma); 138 + struct xe_device *xe = vm->xe; 139 + 140 + if (!lockdep_is_held_type(&vm->userptr.notifier_lock, 0) && 141 + !lockdep_is_held_type(&vm->lock, 0) && 142 + !(vma->gpuva.flags & XE_VMA_DESTROYED)) { 143 + /* Don't unmap in exec critical section. */ 144 + xe_vm_assert_held(vm); 145 + /* Don't unmap while mapping the sg. */ 146 + lockdep_assert_held(&vm->lock); 147 + } 148 + 149 + mutex_lock(&userptr->unmap_mutex); 150 + if (userptr->sg && userptr->mapped) 151 + dma_unmap_sgtable(xe->drm.dev, userptr->sg, 152 + write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE, 0); 153 + userptr->mapped = false; 154 + mutex_unlock(&userptr->unmap_mutex); 155 + } 156 + 157 + /** 151 158 * xe_hmm_userptr_free_sg() - Free the scatter gather table of userptr 152 - * 153 159 * @uvma: the userptr vma which hold the scatter gather table 154 160 * 155 161 * With function xe_userptr_populate_range, we allocate storage of ··· 200 124 void xe_hmm_userptr_free_sg(struct xe_userptr_vma *uvma) 201 125 { 202 126 struct xe_userptr *userptr = &uvma->userptr; 203 - struct xe_vma *vma = &uvma->vma; 204 - bool write = !xe_vma_read_only(vma); 205 - struct xe_vm *vm = xe_vma_vm(vma); 206 - struct xe_device *xe = vm->xe; 207 - struct device *dev = xe->drm.dev; 208 127 209 - xe_assert(xe, userptr->sg); 210 - dma_unmap_sgtable(dev, userptr->sg, 211 - write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE, 0); 212 - 128 + xe_assert(xe_vma_vm(&uvma->vma)->xe, userptr->sg); 129 + xe_hmm_userptr_unmap(uvma); 213 130 sg_free_table(userptr->sg); 214 131 userptr->sg = NULL; 215 132 } ··· 235 166 { 236 167 unsigned long timeout = 237 168 jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); 238 - unsigned long *pfns, flags = HMM_PFN_REQ_FAULT; 169 + unsigned long *pfns; 239 170 struct xe_userptr *userptr; 240 171 struct xe_vma *vma = &uvma->vma; 241 172 u64 userptr_start = xe_vma_userptr(vma); 242 173 u64 userptr_end = userptr_start + xe_vma_size(vma); 243 174 struct xe_vm *vm = xe_vma_vm(vma); 244 - struct hmm_range hmm_range; 175 + struct hmm_range hmm_range = { 176 + .pfn_flags_mask = 0, /* ignore pfns */ 177 + .default_flags = HMM_PFN_REQ_FAULT, 178 + .start = userptr_start, 179 + .end = userptr_end, 180 + .notifier = &uvma->userptr.notifier, 181 + .dev_private_owner = vm->xe, 182 + }; 245 183 bool write = !xe_vma_read_only(vma); 246 184 unsigned long notifier_seq; 247 185 u64 npages; ··· 275 199 return -ENOMEM; 276 200 277 201 if (write) 278 - flags |= HMM_PFN_REQ_WRITE; 202 + hmm_range.default_flags |= HMM_PFN_REQ_WRITE; 279 203 280 204 if (!mmget_not_zero(userptr->notifier.mm)) { 281 205 ret = -EFAULT; 282 206 goto free_pfns; 283 207 } 284 208 285 - hmm_range.default_flags = flags; 286 209 hmm_range.hmm_pfns = pfns; 287 - hmm_range.notifier = &userptr->notifier; 288 - hmm_range.start = userptr_start; 289 - hmm_range.end = userptr_end; 290 - hmm_range.dev_private_owner = vm->xe; 291 210 292 211 while (true) { 293 212 hmm_range.notifier_seq = mmu_interval_read_begin(&userptr->notifier); ··· 309 238 if (ret) 310 239 goto free_pfns; 311 240 312 - ret = xe_build_sg(vm->xe, &hmm_range, &userptr->sgt, write); 241 + ret = xe_alloc_sg(vm->xe, &userptr->sgt, &hmm_range, &vm->userptr.notifier_lock); 313 242 if (ret) 314 243 goto free_pfns; 315 244 245 + ret = down_read_interruptible(&vm->userptr.notifier_lock); 246 + if (ret) 247 + goto free_st; 248 + 249 + if (mmu_interval_read_retry(hmm_range.notifier, hmm_range.notifier_seq)) { 250 + ret = -EAGAIN; 251 + goto out_unlock; 252 + } 253 + 254 + ret = xe_build_sg(vm->xe, &hmm_range, &userptr->sgt, 255 + &vm->userptr.notifier_lock, write); 256 + if (ret) 257 + goto out_unlock; 258 + 316 259 xe_mark_range_accessed(&hmm_range, write); 317 260 userptr->sg = &userptr->sgt; 261 + xe_hmm_userptr_set_mapped(uvma); 318 262 userptr->notifier_seq = hmm_range.notifier_seq; 263 + up_read(&vm->userptr.notifier_lock); 264 + kvfree(pfns); 265 + return 0; 319 266 267 + out_unlock: 268 + up_read(&vm->userptr.notifier_lock); 269 + free_st: 270 + sg_free_table(&userptr->sgt); 320 271 free_pfns: 321 272 kvfree(pfns); 322 273 return ret; 323 274 } 324 -

+7

drivers/gpu/drm/xe/xe_hmm.h

··· 3 3 * Copyright © 2024 Intel Corporation 4 4 */ 5 5 6 + #ifndef _XE_HMM_H_ 7 + #define _XE_HMM_H_ 8 + 6 9 #include <linux/types.h> 7 10 8 11 struct xe_userptr_vma; 9 12 10 13 int xe_hmm_userptr_populate_range(struct xe_userptr_vma *uvma, bool is_mm_mmap_locked); 14 + 11 15 void xe_hmm_userptr_free_sg(struct xe_userptr_vma *uvma); 16 + 17 + void xe_hmm_userptr_unmap(struct xe_userptr_vma *uvma); 18 + #endif

+1

drivers/gpu/drm/xe/xe_hw_engine_group.c

··· 178 178 up_write(&group->mode_sem); 179 179 return err; 180 180 } 181 + ALLOW_ERROR_INJECTION(xe_hw_engine_group_add_exec_queue, ERRNO); 181 182 182 183 /** 183 184 * xe_hw_engine_group_del_exec_queue() - Delete an exec queue from a hw engine group

+175

drivers/gpu/drm/xe/xe_migrate.c

··· 1544 1544 dma_fence_wait(m->fence, false); 1545 1545 } 1546 1546 1547 + static u32 pte_update_cmd_size(u64 size) 1548 + { 1549 + u32 num_dword; 1550 + u64 entries = DIV_ROUND_UP(size, XE_PAGE_SIZE); 1551 + 1552 + XE_WARN_ON(size > MAX_PREEMPTDISABLE_TRANSFER); 1553 + /* 1554 + * MI_STORE_DATA_IMM command is used to update page table. Each 1555 + * instruction can update maximumly 0x1ff pte entries. To update 1556 + * n (n <= 0x1ff) pte entries, we need: 1557 + * 1 dword for the MI_STORE_DATA_IMM command header (opcode etc) 1558 + * 2 dword for the page table's physical location 1559 + * 2*n dword for value of pte to fill (each pte entry is 2 dwords) 1560 + */ 1561 + num_dword = (1 + 2) * DIV_ROUND_UP(entries, 0x1ff); 1562 + num_dword += entries * 2; 1563 + 1564 + return num_dword; 1565 + } 1566 + 1567 + static void build_pt_update_batch_sram(struct xe_migrate *m, 1568 + struct xe_bb *bb, u32 pt_offset, 1569 + dma_addr_t *sram_addr, u32 size) 1570 + { 1571 + u16 pat_index = tile_to_xe(m->tile)->pat.idx[XE_CACHE_WB]; 1572 + u32 ptes; 1573 + int i = 0; 1574 + 1575 + ptes = DIV_ROUND_UP(size, XE_PAGE_SIZE); 1576 + while (ptes) { 1577 + u32 chunk = min(0x1ffU, ptes); 1578 + 1579 + bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(chunk); 1580 + bb->cs[bb->len++] = pt_offset; 1581 + bb->cs[bb->len++] = 0; 1582 + 1583 + pt_offset += chunk * 8; 1584 + ptes -= chunk; 1585 + 1586 + while (chunk--) { 1587 + u64 addr = sram_addr[i++] & PAGE_MASK; 1588 + 1589 + xe_tile_assert(m->tile, addr); 1590 + addr = m->q->vm->pt_ops->pte_encode_addr(m->tile->xe, 1591 + addr, pat_index, 1592 + 0, false, 0); 1593 + bb->cs[bb->len++] = lower_32_bits(addr); 1594 + bb->cs[bb->len++] = upper_32_bits(addr); 1595 + } 1596 + } 1597 + } 1598 + 1599 + enum xe_migrate_copy_dir { 1600 + XE_MIGRATE_COPY_TO_VRAM, 1601 + XE_MIGRATE_COPY_TO_SRAM, 1602 + }; 1603 + 1604 + static struct dma_fence *xe_migrate_vram(struct xe_migrate *m, 1605 + unsigned long npages, 1606 + dma_addr_t *sram_addr, u64 vram_addr, 1607 + const enum xe_migrate_copy_dir dir) 1608 + { 1609 + struct xe_gt *gt = m->tile->primary_gt; 1610 + struct xe_device *xe = gt_to_xe(gt); 1611 + struct dma_fence *fence = NULL; 1612 + u32 batch_size = 2; 1613 + u64 src_L0_ofs, dst_L0_ofs; 1614 + u64 round_update_size; 1615 + struct xe_sched_job *job; 1616 + struct xe_bb *bb; 1617 + u32 update_idx, pt_slot = 0; 1618 + int err; 1619 + 1620 + if (npages * PAGE_SIZE > MAX_PREEMPTDISABLE_TRANSFER) 1621 + return ERR_PTR(-EINVAL); 1622 + 1623 + round_update_size = npages * PAGE_SIZE; 1624 + batch_size += pte_update_cmd_size(round_update_size); 1625 + batch_size += EMIT_COPY_DW; 1626 + 1627 + bb = xe_bb_new(gt, batch_size, true); 1628 + if (IS_ERR(bb)) { 1629 + err = PTR_ERR(bb); 1630 + return ERR_PTR(err); 1631 + } 1632 + 1633 + build_pt_update_batch_sram(m, bb, pt_slot * XE_PAGE_SIZE, 1634 + sram_addr, round_update_size); 1635 + 1636 + if (dir == XE_MIGRATE_COPY_TO_VRAM) { 1637 + src_L0_ofs = xe_migrate_vm_addr(pt_slot, 0); 1638 + dst_L0_ofs = xe_migrate_vram_ofs(xe, vram_addr, false); 1639 + 1640 + } else { 1641 + src_L0_ofs = xe_migrate_vram_ofs(xe, vram_addr, false); 1642 + dst_L0_ofs = xe_migrate_vm_addr(pt_slot, 0); 1643 + } 1644 + 1645 + bb->cs[bb->len++] = MI_BATCH_BUFFER_END; 1646 + update_idx = bb->len; 1647 + 1648 + emit_copy(gt, bb, src_L0_ofs, dst_L0_ofs, round_update_size, 1649 + XE_PAGE_SIZE); 1650 + 1651 + job = xe_bb_create_migration_job(m->q, bb, 1652 + xe_migrate_batch_base(m, true), 1653 + update_idx); 1654 + if (IS_ERR(job)) { 1655 + err = PTR_ERR(job); 1656 + goto err; 1657 + } 1658 + 1659 + xe_sched_job_add_migrate_flush(job, 0); 1660 + 1661 + mutex_lock(&m->job_mutex); 1662 + xe_sched_job_arm(job); 1663 + fence = dma_fence_get(&job->drm.s_fence->finished); 1664 + xe_sched_job_push(job); 1665 + 1666 + dma_fence_put(m->fence); 1667 + m->fence = dma_fence_get(fence); 1668 + mutex_unlock(&m->job_mutex); 1669 + 1670 + xe_bb_free(bb, fence); 1671 + 1672 + return fence; 1673 + 1674 + err: 1675 + xe_bb_free(bb, NULL); 1676 + 1677 + return ERR_PTR(err); 1678 + } 1679 + 1680 + /** 1681 + * xe_migrate_to_vram() - Migrate to VRAM 1682 + * @m: The migration context. 1683 + * @npages: Number of pages to migrate. 1684 + * @src_addr: Array of dma addresses (source of migrate) 1685 + * @dst_addr: Device physical address of VRAM (destination of migrate) 1686 + * 1687 + * Copy from an array dma addresses to a VRAM device physical address 1688 + * 1689 + * Return: dma fence for migrate to signal completion on succees, ERR_PTR on 1690 + * failure 1691 + */ 1692 + struct dma_fence *xe_migrate_to_vram(struct xe_migrate *m, 1693 + unsigned long npages, 1694 + dma_addr_t *src_addr, 1695 + u64 dst_addr) 1696 + { 1697 + return xe_migrate_vram(m, npages, src_addr, dst_addr, 1698 + XE_MIGRATE_COPY_TO_VRAM); 1699 + } 1700 + 1701 + /** 1702 + * xe_migrate_from_vram() - Migrate from VRAM 1703 + * @m: The migration context. 1704 + * @npages: Number of pages to migrate. 1705 + * @src_addr: Device physical address of VRAM (source of migrate) 1706 + * @dst_addr: Array of dma addresses (destination of migrate) 1707 + * 1708 + * Copy from a VRAM device physical address to an array dma addresses 1709 + * 1710 + * Return: dma fence for migrate to signal completion on succees, ERR_PTR on 1711 + * failure 1712 + */ 1713 + struct dma_fence *xe_migrate_from_vram(struct xe_migrate *m, 1714 + unsigned long npages, 1715 + u64 src_addr, 1716 + dma_addr_t *dst_addr) 1717 + { 1718 + return xe_migrate_vram(m, npages, dst_addr, src_addr, 1719 + XE_MIGRATE_COPY_TO_SRAM); 1720 + } 1721 + 1547 1722 #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST) 1548 1723 #include "tests/xe_migrate.c" 1549 1724 #endif

+10

drivers/gpu/drm/xe/xe_migrate.h

··· 95 95 96 96 struct xe_migrate *xe_migrate_init(struct xe_tile *tile); 97 97 98 + struct dma_fence *xe_migrate_to_vram(struct xe_migrate *m, 99 + unsigned long npages, 100 + dma_addr_t *src_addr, 101 + u64 dst_addr); 102 + 103 + struct dma_fence *xe_migrate_from_vram(struct xe_migrate *m, 104 + unsigned long npages, 105 + u64 src_addr, 106 + dma_addr_t *dst_addr); 107 + 98 108 struct dma_fence *xe_migrate_copy(struct xe_migrate *m, 99 109 struct xe_bo *src_bo, 100 110 struct xe_bo *dst_bo,

+7

drivers/gpu/drm/xe/xe_module.c

··· 22 22 .guc_log_level = 3, 23 23 .force_probe = CONFIG_DRM_XE_FORCE_PROBE, 24 24 .wedged_mode = 1, 25 + .svm_notifier_size = 512, 25 26 /* the rest are 0 by default */ 26 27 }; 28 + 29 + module_param_named(svm_notifier_size, xe_modparam.svm_notifier_size, uint, 0600); 30 + MODULE_PARM_DESC(svm_notifier_size, "Set the svm notifier size(in MiB), must be power of 2"); 31 + 32 + module_param_named(always_migrate_to_vram, xe_modparam.always_migrate_to_vram, bool, 0444); 33 + MODULE_PARM_DESC(always_migrate_to_vram, "Always migrate to VRAM on GPU fault"); 27 34 28 35 module_param_named_unsafe(force_execlist, xe_modparam.force_execlist, bool, 0444); 29 36 MODULE_PARM_DESC(force_execlist, "Force Execlist submission");

+2

drivers/gpu/drm/xe/xe_module.h

··· 12 12 struct xe_modparam { 13 13 bool force_execlist; 14 14 bool probe_display; 15 + bool always_migrate_to_vram; 15 16 u32 force_vram_bar_size; 16 17 int guc_log_level; 17 18 char *guc_firmware_path; ··· 23 22 unsigned int max_vfs; 24 23 #endif 25 24 int wedged_mode; 25 + u32 svm_notifier_size; 26 26 }; 27 27 28 28 extern struct xe_modparam xe_modparam;

+12 -23

drivers/gpu/drm/xe/xe_oa.c

··· 12 12 #include <drm/drm_managed.h> 13 13 #include <uapi/drm/xe_drm.h> 14 14 15 + #include <generated/xe_wa_oob.h> 16 + 15 17 #include "abi/guc_actions_slpc_abi.h" 16 18 #include "instructions/xe_mi_commands.h" 17 19 #include "regs/xe_engine_regs.h" ··· 37 35 #include "xe_sched_job.h" 38 36 #include "xe_sriov.h" 39 37 #include "xe_sync.h" 38 + #include "xe_wa.h" 40 39 41 40 #define DEFAULT_POLL_FREQUENCY_HZ 200 42 41 #define DEFAULT_POLL_PERIOD_NS (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY_HZ) ··· 815 812 struct xe_mmio *mmio = &stream->gt->mmio; 816 813 u32 sqcnt1; 817 814 818 - /* 819 - * Wa_1508761755:xehpsdv, dg2 820 - * Enable thread stall DOP gating and EU DOP gating. 821 - */ 822 - if (stream->oa->xe->info.platform == XE_DG2) { 815 + /* Enable thread stall DOP gating and EU DOP gating. */ 816 + if (XE_WA(stream->gt, 1508761755)) { 823 817 xe_gt_mcr_multicast_write(stream->gt, ROW_CHICKEN, 824 818 _MASKED_BIT_DISABLE(STALL_DOP_GATING_DISABLE)); 825 819 xe_gt_mcr_multicast_write(stream->gt, ROW_CHICKEN2, ··· 1065 1065 int ret; 1066 1066 1067 1067 /* 1068 - * Wa_1508761755:xehpsdv, dg2 1069 1068 * EU NOA signals behave incorrectly if EU clock gating is enabled. 1070 1069 * Disable thread stall DOP gating and EU DOP gating. 1071 1070 */ 1072 - if (stream->oa->xe->info.platform == XE_DG2) { 1071 + if (XE_WA(stream->gt, 1508761755)) { 1073 1072 xe_gt_mcr_multicast_write(stream->gt, ROW_CHICKEN, 1074 1073 _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE)); 1075 1074 xe_gt_mcr_multicast_write(stream->gt, ROW_CHICKEN2, ··· 1689 1690 stream->oa_buffer.format = &stream->oa->oa_formats[param->oa_format]; 1690 1691 1691 1692 stream->sample = param->sample; 1692 - stream->periodic = param->period_exponent > 0; 1693 + stream->periodic = param->period_exponent >= 0; 1693 1694 stream->period_exponent = param->period_exponent; 1694 1695 stream->no_preempt = param->no_preempt; 1695 1696 stream->wait_num_reports = param->wait_num_reports; ··· 1719 1720 } 1720 1721 1721 1722 /* 1722 - * Wa_1509372804:pvc 1723 - * 1724 1723 * GuC reset of engines causes OA to lose configuration 1725 1724 * state. Prevent this by overriding GUCRC mode. 1726 1725 */ 1727 - if (stream->oa->xe->info.platform == XE_PVC) { 1726 + if (XE_WA(stream->gt, 1509372804)) { 1728 1727 ret = xe_guc_pc_override_gucrc_mode(&gt->uc.guc.pc, 1729 1728 SLPC_GUCRC_MODE_GUCRC_NO_RC6); 1730 1729 if (ret) ··· 1854 1857 { 1855 1858 u32 reg, shift; 1856 1859 1857 - /* 1858 - * Wa_18013179988:dg2 1859 - * Wa_14015568240:pvc 1860 - * Wa_14015846243:mtl 1861 - */ 1862 - switch (gt_to_xe(gt)->info.platform) { 1863 - case XE_DG2: 1864 - case XE_PVC: 1865 - case XE_METEORLAKE: 1860 + if (XE_WA(gt, 18013179988) || XE_WA(gt, 14015568240)) { 1866 1861 xe_pm_runtime_get(gt_to_xe(gt)); 1867 1862 reg = xe_mmio_read32(&gt->mmio, RPM_CONFIG0); 1868 1863 xe_pm_runtime_put(gt_to_xe(gt)); 1869 1864 1870 1865 shift = REG_FIELD_GET(RPM_CONFIG0_CTC_SHIFT_PARAMETER_MASK, reg); 1871 1866 return gt->info.reference_clock << (3 - shift); 1872 - 1873 - default: 1867 + } else { 1874 1868 return gt->info.reference_clock; 1875 1869 } 1876 1870 } ··· 1959 1971 } 1960 1972 1961 1973 param.xef = xef; 1974 + param.period_exponent = -1; 1962 1975 ret = xe_oa_user_extensions(oa, XE_OA_USER_EXTN_FROM_OPEN, data, 0, &param); 1963 1976 if (ret) 1964 1977 return ret; ··· 2014 2025 goto err_exec_q; 2015 2026 } 2016 2027 2017 - if (param.period_exponent > 0) { 2028 + if (param.period_exponent >= 0) { 2018 2029 u64 oa_period, oa_freq_hz; 2019 2030 2020 2031 /* Requesting samples from OAG buffer is a privileged operation */

+14

drivers/gpu/drm/xe/xe_observation.c

··· 8 8 9 9 #include <uapi/drm/xe_drm.h> 10 10 11 + #include "xe_eu_stall.h" 11 12 #include "xe_oa.h" 12 13 #include "xe_observation.h" 13 14 ··· 25 24 return xe_oa_add_config_ioctl(dev, arg->param, file); 26 25 case DRM_XE_OBSERVATION_OP_REMOVE_CONFIG: 27 26 return xe_oa_remove_config_ioctl(dev, arg->param, file); 27 + default: 28 + return -EINVAL; 29 + } 30 + } 31 + 32 + static int xe_eu_stall_ioctl(struct drm_device *dev, struct drm_xe_observation_param *arg, 33 + struct drm_file *file) 34 + { 35 + switch (arg->observation_op) { 36 + case DRM_XE_OBSERVATION_OP_STREAM_OPEN: 37 + return xe_eu_stall_stream_open(dev, arg->param, file); 28 38 default: 29 39 return -EINVAL; 30 40 } ··· 63 51 switch (arg->observation_type) { 64 52 case DRM_XE_OBSERVATION_TYPE_OA: 65 53 return xe_oa_ioctl(dev, arg, file); 54 + case DRM_XE_OBSERVATION_TYPE_EU_STALL: 55 + return xe_eu_stall_ioctl(dev, arg, file); 66 56 default: 67 57 return -EINVAL; 68 58 }

+99 -146

drivers/gpu/drm/xe/xe_pci.c

··· 46 46 47 47 struct xe_device_desc { 48 48 /* Should only ever be set for platforms without GMD_ID */ 49 - const struct xe_graphics_desc *graphics; 49 + const struct xe_ip *pre_gmdid_graphics_ip; 50 50 /* Should only ever be set for platforms without GMD_ID */ 51 - const struct xe_media_desc *media; 51 + const struct xe_ip *pre_gmdid_media_ip; 52 52 53 53 const char *platform_name; 54 54 const struct xe_subplatform_desc *subplatforms; ··· 82 82 #define NOP(x) x 83 83 84 84 static const struct xe_graphics_desc graphics_xelp = { 85 - .name = "Xe_LP", 86 - .ver = 12, 87 - .rel = 0, 88 - 89 - .hw_engine_mask = BIT(XE_HW_ENGINE_RCS0) | BIT(XE_HW_ENGINE_BCS0), 90 - 91 - .va_bits = 48, 92 - .vm_max_level = 3, 93 - }; 94 - 95 - static const struct xe_graphics_desc graphics_xelpp = { 96 - .name = "Xe_LP+", 97 - .ver = 12, 98 - .rel = 10, 99 - 100 85 .hw_engine_mask = BIT(XE_HW_ENGINE_RCS0) | BIT(XE_HW_ENGINE_BCS0), 101 86 102 87 .va_bits = 48, ··· 94 109 .vm_max_level = 3 95 110 96 111 static const struct xe_graphics_desc graphics_xehpg = { 97 - .name = "Xe_HPG", 98 - .ver = 12, 99 - .rel = 55, 100 - 101 112 .hw_engine_mask = 102 113 BIT(XE_HW_ENGINE_RCS0) | BIT(XE_HW_ENGINE_BCS0) | 103 114 BIT(XE_HW_ENGINE_CCS0) | BIT(XE_HW_ENGINE_CCS1) | ··· 106 125 }; 107 126 108 127 static const struct xe_graphics_desc graphics_xehpc = { 109 - .name = "Xe_HPC", 110 - .ver = 12, 111 - .rel = 60, 112 - 113 128 .hw_engine_mask = 114 129 BIT(XE_HW_ENGINE_BCS0) | BIT(XE_HW_ENGINE_BCS1) | 115 130 BIT(XE_HW_ENGINE_BCS2) | BIT(XE_HW_ENGINE_BCS3) | ··· 126 149 }; 127 150 128 151 static const struct xe_graphics_desc graphics_xelpg = { 129 - .name = "Xe_LPG", 130 152 .hw_engine_mask = 131 153 BIT(XE_HW_ENGINE_RCS0) | BIT(XE_HW_ENGINE_BCS0) | 132 154 BIT(XE_HW_ENGINE_CCS0), ··· 148 172 GENMASK(XE_HW_ENGINE_CCS3, XE_HW_ENGINE_CCS0) 149 173 150 174 static const struct xe_graphics_desc graphics_xe2 = { 151 - .name = "Xe2_LPG / Xe2_HPG / Xe3_LPG", 152 - 153 175 XE2_GFX_FEATURES, 154 176 }; 155 177 156 178 static const struct xe_media_desc media_xem = { 157 - .name = "Xe_M", 158 - .ver = 12, 159 - .rel = 0, 160 - 161 - .hw_engine_mask = 162 - GENMASK(XE_HW_ENGINE_VCS7, XE_HW_ENGINE_VCS0) | 163 - GENMASK(XE_HW_ENGINE_VECS3, XE_HW_ENGINE_VECS0), 164 - }; 165 - 166 - static const struct xe_media_desc media_xehpm = { 167 - .name = "Xe_HPM", 168 - .ver = 12, 169 - .rel = 55, 170 - 171 179 .hw_engine_mask = 172 180 GENMASK(XE_HW_ENGINE_VCS7, XE_HW_ENGINE_VCS0) | 173 181 GENMASK(XE_HW_ENGINE_VECS3, XE_HW_ENGINE_VECS0), 174 182 }; 175 183 176 184 static const struct xe_media_desc media_xelpmp = { 177 - .name = "Xe_LPM+", 178 185 .hw_engine_mask = 179 186 GENMASK(XE_HW_ENGINE_VCS7, XE_HW_ENGINE_VCS0) | 180 187 GENMASK(XE_HW_ENGINE_VECS3, XE_HW_ENGINE_VECS0) | 181 188 BIT(XE_HW_ENGINE_GSCCS0) 182 189 }; 183 190 184 - static const struct xe_media_desc media_xe2 = { 185 - .name = "Xe2_LPM / Xe2_HPM / Xe3_LPM", 186 - .hw_engine_mask = 187 - GENMASK(XE_HW_ENGINE_VCS7, XE_HW_ENGINE_VCS0) | 188 - GENMASK(XE_HW_ENGINE_VECS3, XE_HW_ENGINE_VECS0) | 189 - BIT(XE_HW_ENGINE_GSCCS0) 191 + /* Pre-GMDID Graphics IPs */ 192 + static const struct xe_ip graphics_ip_xelp = { 1200, "Xe_LP", &graphics_xelp }; 193 + static const struct xe_ip graphics_ip_xelpp = { 1210, "Xe_LP+", &graphics_xelp }; 194 + static const struct xe_ip graphics_ip_xehpg = { 1255, "Xe_HPG", &graphics_xehpg }; 195 + static const struct xe_ip graphics_ip_xehpc = { 1260, "Xe_HPC", &graphics_xehpc }; 196 + 197 + /* GMDID-based Graphics IPs */ 198 + static const struct xe_ip graphics_ips[] = { 199 + { 1270, "Xe_LPG", &graphics_xelpg }, 200 + { 1271, "Xe_LPG", &graphics_xelpg }, 201 + { 1274, "Xe_LPG+", &graphics_xelpg }, 202 + { 2001, "Xe2_HPG", &graphics_xe2 }, 203 + { 2004, "Xe2_LPG", &graphics_xe2 }, 204 + { 3000, "Xe3_LPG", &graphics_xe2 }, 205 + { 3001, "Xe3_LPG", &graphics_xe2 }, 206 + }; 207 + 208 + /* Pre-GMDID Media IPs */ 209 + static const struct xe_ip media_ip_xem = { 1200, "Xe_M", &media_xem }; 210 + static const struct xe_ip media_ip_xehpm = { 1255, "Xe_HPM", &media_xem }; 211 + 212 + /* GMDID-based Media IPs */ 213 + static const struct xe_ip media_ips[] = { 214 + { 1300, "Xe_LPM+", &media_xelpmp }, 215 + { 1301, "Xe2_HPM", &media_xelpmp }, 216 + { 2000, "Xe2_LPM", &media_xelpmp }, 217 + { 3000, "Xe3_LPM", &media_xelpmp }, 190 218 }; 191 219 192 220 static const struct xe_device_desc tgl_desc = { 193 - .graphics = &graphics_xelp, 194 - .media = &media_xem, 221 + .pre_gmdid_graphics_ip = &graphics_ip_xelp, 222 + .pre_gmdid_media_ip = &media_ip_xem, 195 223 PLATFORM(TIGERLAKE), 196 224 .dma_mask_size = 39, 197 225 .has_display = true, ··· 204 224 }; 205 225 206 226 static const struct xe_device_desc rkl_desc = { 207 - .graphics = &graphics_xelp, 208 - .media = &media_xem, 227 + .pre_gmdid_graphics_ip = &graphics_ip_xelp, 228 + .pre_gmdid_media_ip = &media_ip_xem, 209 229 PLATFORM(ROCKETLAKE), 210 230 .dma_mask_size = 39, 211 231 .has_display = true, ··· 216 236 static const u16 adls_rpls_ids[] = { INTEL_RPLS_IDS(NOP), 0 }; 217 237 218 238 static const struct xe_device_desc adl_s_desc = { 219 - .graphics = &graphics_xelp, 220 - .media = &media_xem, 239 + .pre_gmdid_graphics_ip = &graphics_ip_xelp, 240 + .pre_gmdid_media_ip = &media_ip_xem, 221 241 PLATFORM(ALDERLAKE_S), 222 242 .dma_mask_size = 39, 223 243 .has_display = true, ··· 232 252 static const u16 adlp_rplu_ids[] = { INTEL_RPLU_IDS(NOP), 0 }; 233 253 234 254 static const struct xe_device_desc adl_p_desc = { 235 - .graphics = &graphics_xelp, 236 - .media = &media_xem, 255 + .pre_gmdid_graphics_ip = &graphics_ip_xelp, 256 + .pre_gmdid_media_ip = &media_ip_xem, 237 257 PLATFORM(ALDERLAKE_P), 238 258 .dma_mask_size = 39, 239 259 .has_display = true, ··· 246 266 }; 247 267 248 268 static const struct xe_device_desc adl_n_desc = { 249 - .graphics = &graphics_xelp, 250 - .media = &media_xem, 269 + .pre_gmdid_graphics_ip = &graphics_ip_xelp, 270 + .pre_gmdid_media_ip = &media_ip_xem, 251 271 PLATFORM(ALDERLAKE_N), 252 272 .dma_mask_size = 39, 253 273 .has_display = true, ··· 259 279 .is_dgfx = 1 260 280 261 281 static const struct xe_device_desc dg1_desc = { 262 - .graphics = &graphics_xelpp, 263 - .media = &media_xem, 282 + .pre_gmdid_graphics_ip = &graphics_ip_xelpp, 283 + .pre_gmdid_media_ip = &media_ip_xem, 264 284 DGFX_FEATURES, 265 285 PLATFORM(DG1), 266 286 .dma_mask_size = 39, ··· 285 305 } 286 306 287 307 static const struct xe_device_desc ats_m_desc = { 288 - .graphics = &graphics_xehpg, 289 - .media = &media_xehpm, 308 + .pre_gmdid_graphics_ip = &graphics_ip_xehpg, 309 + .pre_gmdid_media_ip = &media_ip_xehpm, 290 310 .dma_mask_size = 46, 291 311 .require_force_probe = true, 292 312 ··· 295 315 }; 296 316 297 317 static const struct xe_device_desc dg2_desc = { 298 - .graphics = &graphics_xehpg, 299 - .media = &media_xehpm, 318 + .pre_gmdid_graphics_ip = &graphics_ip_xehpg, 319 + .pre_gmdid_media_ip = &media_ip_xehpm, 300 320 .dma_mask_size = 46, 301 321 .require_force_probe = true, 302 322 ··· 305 325 }; 306 326 307 327 static const __maybe_unused struct xe_device_desc pvc_desc = { 308 - .graphics = &graphics_xehpc, 328 + .pre_gmdid_graphics_ip = &graphics_ip_xehpc, 309 329 DGFX_FEATURES, 310 330 PLATFORM(PVC), 311 331 .dma_mask_size = 52, ··· 349 369 350 370 #undef PLATFORM 351 371 __diag_pop(); 352 - 353 - /* Map of GMD_ID values to graphics IP */ 354 - static const struct gmdid_map graphics_ip_map[] = { 355 - { 1270, &graphics_xelpg }, 356 - { 1271, &graphics_xelpg }, 357 - { 1274, &graphics_xelpg }, /* Xe_LPG+ */ 358 - { 2001, &graphics_xe2 }, 359 - { 2004, &graphics_xe2 }, 360 - { 3000, &graphics_xe2 }, 361 - { 3001, &graphics_xe2 }, 362 - }; 363 - 364 - /* Map of GMD_ID values to media IP */ 365 - static const struct gmdid_map media_ip_map[] = { 366 - { 1300, &media_xelpmp }, 367 - { 1301, &media_xe2 }, 368 - { 2000, &media_xe2 }, 369 - { 3000, &media_xe2 }, 370 - }; 371 372 372 373 /* 373 374 * Make sure any device matches here are from most specific to most ··· 510 549 } 511 550 512 551 /* 513 - * Pre-GMD_ID platform: device descriptor already points to the appropriate 514 - * graphics descriptor. Simply forward the description and calculate the version 515 - * appropriately. "graphics" should be present in all such platforms, while 516 - * media is optional. 517 - */ 518 - static void handle_pre_gmdid(struct xe_device *xe, 519 - const struct xe_graphics_desc *graphics, 520 - const struct xe_media_desc *media) 521 - { 522 - xe->info.graphics_verx100 = graphics->ver * 100 + graphics->rel; 523 - 524 - if (media) 525 - xe->info.media_verx100 = media->ver * 100 + media->rel; 526 - 527 - } 528 - 529 - /* 530 - * GMD_ID platform: read IP version from hardware and select graphics descriptor 552 + * Read IP version from hardware and select graphics/media IP descriptors 531 553 * based on the result. 532 554 */ 533 555 static void handle_gmdid(struct xe_device *xe, 534 - const struct xe_graphics_desc **graphics, 535 - const struct xe_media_desc **media, 556 + const struct xe_ip **graphics_ip, 557 + const struct xe_ip **media_ip, 536 558 u32 *graphics_revid, 537 559 u32 *media_revid) 538 560 { 539 561 u32 ver; 540 562 563 + *graphics_ip = NULL; 564 + *media_ip = NULL; 565 + 541 566 read_gmdid(xe, GMDID_GRAPHICS, &ver, graphics_revid); 542 567 543 - for (int i = 0; i < ARRAY_SIZE(graphics_ip_map); i++) { 544 - if (ver == graphics_ip_map[i].ver) { 545 - xe->info.graphics_verx100 = ver; 546 - *graphics = graphics_ip_map[i].ip; 568 + for (int i = 0; i < ARRAY_SIZE(graphics_ips); i++) { 569 + if (ver == graphics_ips[i].verx100) { 570 + *graphics_ip = &graphics_ips[i]; 547 571 548 572 break; 549 573 } 550 574 } 551 575 552 - if (!xe->info.graphics_verx100) { 576 + if (!*graphics_ip) { 553 577 drm_err(&xe->drm, "Hardware reports unknown graphics version %u.%02u\n", 554 578 ver / 100, ver % 100); 555 579 } 556 580 557 581 read_gmdid(xe, GMDID_MEDIA, &ver, media_revid); 558 - 559 582 /* Media may legitimately be fused off / not present */ 560 583 if (ver == 0) 561 584 return; 562 585 563 - for (int i = 0; i < ARRAY_SIZE(media_ip_map); i++) { 564 - if (ver == media_ip_map[i].ver) { 565 - xe->info.media_verx100 = ver; 566 - *media = media_ip_map[i].ip; 586 + for (int i = 0; i < ARRAY_SIZE(media_ips); i++) { 587 + if (ver == media_ips[i].verx100) { 588 + *media_ip = &media_ips[i]; 567 589 568 590 break; 569 591 } 570 592 } 571 593 572 - if (!xe->info.media_verx100) { 594 + if (!*media_ip) { 573 595 drm_err(&xe->drm, "Hardware reports unknown media version %u.%02u\n", 574 596 ver / 100, ver % 100); 575 597 } ··· 603 659 * present in device info. 604 660 */ 605 661 static int xe_info_init(struct xe_device *xe, 606 - const struct xe_graphics_desc *graphics_desc, 607 - const struct xe_media_desc *media_desc) 662 + const struct xe_device_desc *desc) 608 663 { 609 664 u32 graphics_gmdid_revid = 0, media_gmdid_revid = 0; 665 + const struct xe_ip *graphics_ip; 666 + const struct xe_ip *media_ip; 667 + const struct xe_graphics_desc *graphics_desc; 668 + const struct xe_media_desc *media_desc; 610 669 struct xe_tile *tile; 611 670 struct xe_gt *gt; 612 671 u8 id; 613 672 614 673 /* 615 674 * If this platform supports GMD_ID, we'll detect the proper IP 616 - * descriptor to use from hardware registers. desc->graphics will only 617 - * ever be set at this point for platforms before GMD_ID. In that case 618 - * the IP descriptions and versions are simply derived from that. 675 + * descriptor to use from hardware registers. 676 + * desc->pre_gmdid_graphics_ip will only ever be set at this point for 677 + * platforms before GMD_ID. In that case the IP descriptions and 678 + * versions are simply derived from that. 619 679 */ 620 - if (graphics_desc) { 621 - handle_pre_gmdid(xe, graphics_desc, media_desc); 680 + if (desc->pre_gmdid_graphics_ip) { 681 + graphics_ip = desc->pre_gmdid_graphics_ip; 682 + media_ip = desc->pre_gmdid_media_ip; 622 683 xe->info.step = xe_step_pre_gmdid_get(xe); 623 684 } else { 624 - xe_assert(xe, !media_desc); 625 - handle_gmdid(xe, &graphics_desc, &media_desc, 685 + xe_assert(xe, !desc->pre_gmdid_media_ip); 686 + handle_gmdid(xe, &graphics_ip, &media_ip, 626 687 &graphics_gmdid_revid, &media_gmdid_revid); 627 688 xe->info.step = xe_step_gmdid_get(xe, 628 689 graphics_gmdid_revid, ··· 639 690 * error and we should abort driver load. Failing to detect media 640 691 * IP is non-fatal; we'll just proceed without enabling media support. 641 692 */ 642 - if (!graphics_desc) 693 + if (!graphics_ip) 643 694 return -ENODEV; 644 695 645 - xe->info.graphics_name = graphics_desc->name; 646 - xe->info.media_name = media_desc ? media_desc->name : "none"; 696 + xe->info.graphics_verx100 = graphics_ip->verx100; 697 + xe->info.graphics_name = graphics_ip->name; 698 + graphics_desc = graphics_ip->desc; 699 + 700 + if (media_ip) { 701 + xe->info.media_verx100 = media_ip->verx100; 702 + xe->info.media_name = media_ip->name; 703 + media_desc = media_ip->desc; 704 + } else { 705 + xe->info.media_name = "none"; 706 + media_desc = NULL; 707 + } 647 708 648 709 xe->info.vram_flags = graphics_desc->vram_flags; 649 710 xe->info.va_bits = graphics_desc->va_bits; ··· 724 765 725 766 static void xe_pci_remove(struct pci_dev *pdev) 726 767 { 727 - struct xe_device *xe; 728 - 729 - xe = pdev_to_xe_device(pdev); 730 - if (!xe) /* driver load aborted, nothing to cleanup */ 731 - return; 768 + struct xe_device *xe = pdev_to_xe_device(pdev); 732 769 733 770 if (IS_SRIOV_PF(xe)) 734 771 xe_pci_sriov_configure(pdev, 0); 735 772 736 - if (xe_survivability_mode_enabled(xe)) 737 - return xe_survivability_mode_remove(xe); 773 + if (xe_survivability_mode_is_enabled(xe)) 774 + return; 738 775 739 776 xe_device_remove(xe); 740 777 xe_pm_runtime_fini(xe); 741 - pci_set_drvdata(pdev, NULL); 742 778 } 743 779 744 780 /* ··· 805 851 err = xe_device_probe_early(xe); 806 852 807 853 /* 808 - * In Boot Survivability mode, no drm card is exposed 809 - * and driver is loaded with bare minimum to allow 810 - * for firmware to be flashed through mei. Return 811 - * success if survivability mode is enabled. 854 + * In Boot Survivability mode, no drm card is exposed and driver is 855 + * loaded with bare minimum to allow for firmware to be flashed through 856 + * mei. If early probe fails, check if survivability mode is flagged by 857 + * HW to be enabled. In that case enable it and return success. 812 858 */ 813 859 if (err) { 814 - if (xe_survivability_mode_enabled(xe)) 860 + if (xe_survivability_mode_required(xe) && 861 + xe_survivability_mode_enable(xe)) 815 862 return 0; 816 863 817 864 return err; 818 865 } 819 866 820 - err = xe_info_init(xe, desc->graphics, desc->media); 867 + err = xe_info_init(xe, desc); 821 868 if (err) 822 869 return err; 823 870 ··· 855 900 return err; 856 901 857 902 err = xe_device_probe(xe); 858 - if (err) { 859 - xe_device_call_remove_actions(xe); 903 + if (err) 860 904 return err; 861 - } 862 905 863 906 err = xe_pm_init(xe); 864 907 if (err) ··· 906 953 struct xe_device *xe = pdev_to_xe_device(pdev); 907 954 int err; 908 955 909 - if (xe_survivability_mode_enabled(xe)) 956 + if (xe_survivability_mode_is_enabled(xe)) 910 957 return -EBUSY; 911 958 912 959 err = xe_pm_suspend(xe);

+51

drivers/gpu/drm/xe/xe_pci_sriov.c

··· 62 62 xe_gt_sriov_pf_control_trigger_flr(gt, n); 63 63 } 64 64 65 + static struct pci_dev *xe_pci_pf_get_vf_dev(struct xe_device *xe, unsigned int vf_id) 66 + { 67 + struct pci_dev *pdev = to_pci_dev(xe->drm.dev); 68 + 69 + xe_assert(xe, IS_SRIOV_PF(xe)); 70 + 71 + /* caller must use pci_dev_put() */ 72 + return pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus), 73 + pdev->bus->number, 74 + pci_iov_virtfn_devfn(pdev, vf_id)); 75 + } 76 + 77 + static void pf_link_vfs(struct xe_device *xe, int num_vfs) 78 + { 79 + struct pci_dev *pdev_pf = to_pci_dev(xe->drm.dev); 80 + struct device_link *link; 81 + struct pci_dev *pdev_vf; 82 + unsigned int n; 83 + 84 + /* 85 + * When both PF and VF devices are enabled on the host, during system 86 + * resume they are resuming in parallel. 87 + * 88 + * But PF has to complete the provision of VF first to allow any VFs to 89 + * successfully resume. 90 + * 91 + * Create a parent-child device link between PF and VF devices that will 92 + * enforce correct resume order. 93 + */ 94 + for (n = 1; n <= num_vfs; n++) { 95 + pdev_vf = xe_pci_pf_get_vf_dev(xe, n - 1); 96 + 97 + /* unlikely, something weird is happening, abort */ 98 + if (!pdev_vf) { 99 + xe_sriov_err(xe, "Cannot find VF%u device, aborting link%s creation!\n", 100 + n, str_plural(num_vfs)); 101 + break; 102 + } 103 + 104 + link = device_link_add(&pdev_vf->dev, &pdev_pf->dev, 105 + DL_FLAG_AUTOREMOVE_CONSUMER); 106 + /* unlikely and harmless, continue with other VFs */ 107 + if (!link) 108 + xe_sriov_notice(xe, "Failed linking VF%u\n", n); 109 + 110 + pci_dev_put(pdev_vf); 111 + } 112 + } 113 + 65 114 static int pf_enable_vfs(struct xe_device *xe, int num_vfs) 66 115 { 67 116 struct pci_dev *pdev = to_pci_dev(xe->drm.dev); ··· 140 91 err = pci_enable_sriov(pdev, num_vfs); 141 92 if (err < 0) 142 93 goto failed; 94 + 95 + pf_link_vfs(xe, num_vfs); 143 96 144 97 xe_sriov_info(xe, "Enabled %u of %u VF%s\n", 145 98 num_vfs, total_vfs, str_plural(total_vfs));

+4 -11

drivers/gpu/drm/xe/xe_pci_types.h

··· 9 9 #include <linux/types.h> 10 10 11 11 struct xe_graphics_desc { 12 - const char *name; 13 - u8 ver; 14 - u8 rel; 15 - 16 12 u8 va_bits; 17 13 u8 vm_max_level; 18 14 u8 vram_flags; ··· 24 28 }; 25 29 26 30 struct xe_media_desc { 27 - const char *name; 28 - u8 ver; 29 - u8 rel; 30 - 31 31 u64 hw_engine_mask; /* hardware engines provided by media IP */ 32 32 33 33 u8 has_indirect_ring_state:1; 34 34 }; 35 35 36 - struct gmdid_map { 37 - unsigned int ver; 38 - const void *ip; 36 + struct xe_ip { 37 + unsigned int verx100; 38 + const char *name; 39 + const void *desc; 39 40 }; 40 41 41 42 #endif

+166 -9

drivers/gpu/drm/xe/xe_pmu.c

··· 7 7 #include <linux/device.h> 8 8 9 9 #include "xe_device.h" 10 + #include "xe_force_wake.h" 10 11 #include "xe_gt_idle.h" 12 + #include "xe_guc_engine_activity.h" 13 + #include "xe_hw_engine.h" 11 14 #include "xe_pm.h" 12 15 #include "xe_pmu.h" 13 16 14 17 /** 15 18 * DOC: Xe PMU (Performance Monitoring Unit) 16 19 * 17 - * Expose events/counters like GT-C6 residency and GT frequency to user land via 18 - * the perf interface. Events are per device. The GT can be selected with an 19 - * extra config sub-field (bits 60-63). 20 + * Expose events/counters like GT-C6 residency, GT frequency and per-class-engine 21 + * activity to user land via the perf interface. Events are per device. 20 22 * 21 23 * All events are listed in sysfs: 22 24 * ··· 26 24 * $ ls /sys/bus/event_source/devices/xe_0000_00_02.0/events/ 27 25 * $ ls /sys/bus/event_source/devices/xe_0000_00_02.0/format/ 28 26 * 29 - * The format directory has info regarding the configs that can be used. 27 + * The following format parameters are available to read events, 28 + * but only few are valid with each event: 29 + * 30 + * gt[60:63] Selects gt for the event 31 + * engine_class[20:27] Selects engine-class for event 32 + * engine_instance[12:19] Selects the engine-instance for the event 33 + * 34 + * For engine specific events (engine-*), gt, engine_class and engine_instance parameters must be 35 + * set as populated by DRM_XE_DEVICE_QUERY_ENGINES. 36 + * 37 + * For gt specific events (gt-*) gt parameter must be passed. All other parameters will be 0. 38 + * 30 39 * The standard perf tool can be used to grep for a certain event as well. 31 40 * Example: 32 41 * ··· 48 35 * $ perf stat -e <event_name,gt=> -I <interval> 49 36 */ 50 37 51 - #define XE_PMU_EVENT_GT_MASK GENMASK_ULL(63, 60) 52 - #define XE_PMU_EVENT_ID_MASK GENMASK_ULL(11, 0) 38 + #define XE_PMU_EVENT_GT_MASK GENMASK_ULL(63, 60) 39 + #define XE_PMU_EVENT_ENGINE_CLASS_MASK GENMASK_ULL(27, 20) 40 + #define XE_PMU_EVENT_ENGINE_INSTANCE_MASK GENMASK_ULL(19, 12) 41 + #define XE_PMU_EVENT_ID_MASK GENMASK_ULL(11, 0) 53 42 54 43 static unsigned int config_to_event_id(u64 config) 55 44 { 56 45 return FIELD_GET(XE_PMU_EVENT_ID_MASK, config); 46 + } 47 + 48 + static unsigned int config_to_engine_class(u64 config) 49 + { 50 + return FIELD_GET(XE_PMU_EVENT_ENGINE_CLASS_MASK, config); 51 + } 52 + 53 + static unsigned int config_to_engine_instance(u64 config) 54 + { 55 + return FIELD_GET(XE_PMU_EVENT_ENGINE_INSTANCE_MASK, config); 57 56 } 58 57 59 58 static unsigned int config_to_gt_id(u64 config) ··· 73 48 return FIELD_GET(XE_PMU_EVENT_GT_MASK, config); 74 49 } 75 50 76 - #define XE_PMU_EVENT_GT_C6_RESIDENCY 0x01 51 + #define XE_PMU_EVENT_GT_C6_RESIDENCY 0x01 52 + #define XE_PMU_EVENT_ENGINE_ACTIVE_TICKS 0x02 53 + #define XE_PMU_EVENT_ENGINE_TOTAL_TICKS 0x03 77 54 78 55 static struct xe_gt *event_to_gt(struct perf_event *event) 79 56 { ··· 83 56 u64 gt = config_to_gt_id(event->attr.config); 84 57 85 58 return xe_device_get_gt(xe, gt); 59 + } 60 + 61 + static struct xe_hw_engine *event_to_hwe(struct perf_event *event) 62 + { 63 + struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base); 64 + struct drm_xe_engine_class_instance eci; 65 + u64 config = event->attr.config; 66 + struct xe_hw_engine *hwe; 67 + 68 + eci.engine_class = config_to_engine_class(config); 69 + eci.engine_instance = config_to_engine_instance(config); 70 + eci.gt_id = config_to_gt_id(config); 71 + 72 + hwe = xe_hw_engine_lookup(xe, eci); 73 + if (!hwe || xe_hw_engine_is_reserved(hwe)) 74 + return NULL; 75 + 76 + return hwe; 77 + } 78 + 79 + static bool is_engine_event(u64 config) 80 + { 81 + unsigned int event_id = config_to_event_id(config); 82 + 83 + return (event_id == XE_PMU_EVENT_ENGINE_TOTAL_TICKS || 84 + event_id == XE_PMU_EVENT_ENGINE_ACTIVE_TICKS); 85 + } 86 + 87 + static bool event_gt_forcewake(struct perf_event *event) 88 + { 89 + struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base); 90 + u64 config = event->attr.config; 91 + struct xe_gt *gt; 92 + unsigned int *fw_ref; 93 + 94 + if (!is_engine_event(config)) 95 + return true; 96 + 97 + gt = xe_device_get_gt(xe, config_to_gt_id(config)); 98 + 99 + fw_ref = kzalloc(sizeof(*fw_ref), GFP_KERNEL); 100 + if (!fw_ref) 101 + return false; 102 + 103 + *fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 104 + if (!*fw_ref) { 105 + kfree(fw_ref); 106 + return false; 107 + } 108 + 109 + event->pmu_private = fw_ref; 110 + 111 + return true; 86 112 } 87 113 88 114 static bool event_supported(struct xe_pmu *pmu, unsigned int gt, ··· 148 68 pmu->supported_events & BIT_ULL(id); 149 69 } 150 70 71 + static bool event_param_valid(struct perf_event *event) 72 + { 73 + struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base); 74 + unsigned int engine_class, engine_instance; 75 + u64 config = event->attr.config; 76 + struct xe_gt *gt; 77 + 78 + gt = xe_device_get_gt(xe, config_to_gt_id(config)); 79 + if (!gt) 80 + return false; 81 + 82 + engine_class = config_to_engine_class(config); 83 + engine_instance = config_to_engine_instance(config); 84 + 85 + switch (config_to_event_id(config)) { 86 + case XE_PMU_EVENT_GT_C6_RESIDENCY: 87 + if (engine_class || engine_instance) 88 + return false; 89 + break; 90 + case XE_PMU_EVENT_ENGINE_ACTIVE_TICKS: 91 + case XE_PMU_EVENT_ENGINE_TOTAL_TICKS: 92 + if (!event_to_hwe(event)) 93 + return false; 94 + break; 95 + } 96 + 97 + return true; 98 + } 99 + 151 100 static void xe_pmu_event_destroy(struct perf_event *event) 152 101 { 153 102 struct xe_device *xe = container_of(event->pmu, typeof(*xe), pmu.base); 103 + struct xe_gt *gt; 104 + unsigned int *fw_ref = event->pmu_private; 105 + 106 + if (fw_ref) { 107 + gt = xe_device_get_gt(xe, config_to_gt_id(event->attr.config)); 108 + xe_force_wake_put(gt_to_fw(gt), *fw_ref); 109 + kfree(fw_ref); 110 + event->pmu_private = NULL; 111 + } 154 112 155 113 drm_WARN_ON(&xe->drm, event->parent); 156 114 xe_pm_runtime_put(xe); ··· 222 104 if (has_branch_stack(event)) 223 105 return -EOPNOTSUPP; 224 106 107 + if (!event_param_valid(event)) 108 + return -ENOENT; 109 + 225 110 if (!event->parent) { 226 111 drm_dev_get(&xe->drm); 227 112 xe_pm_runtime_get(xe); 113 + if (!event_gt_forcewake(event)) { 114 + xe_pm_runtime_put(xe); 115 + drm_dev_put(&xe->drm); 116 + return -EINVAL; 117 + } 228 118 event->destroy = xe_pmu_event_destroy; 229 119 } 230 120 231 121 return 0; 122 + } 123 + 124 + static u64 read_engine_events(struct xe_gt *gt, struct perf_event *event) 125 + { 126 + struct xe_hw_engine *hwe; 127 + u64 val = 0; 128 + 129 + hwe = event_to_hwe(event); 130 + if (config_to_event_id(event->attr.config) == XE_PMU_EVENT_ENGINE_ACTIVE_TICKS) 131 + val = xe_guc_engine_activity_active_ticks(&gt->uc.guc, hwe); 132 + else 133 + val = xe_guc_engine_activity_total_ticks(&gt->uc.guc, hwe); 134 + 135 + return val; 232 136 } 233 137 234 138 static u64 __xe_pmu_event_read(struct perf_event *event) ··· 263 123 switch (config_to_event_id(event->attr.config)) { 264 124 case XE_PMU_EVENT_GT_C6_RESIDENCY: 265 125 return xe_gt_idle_residency_msec(&gt->gtidle); 126 + case XE_PMU_EVENT_ENGINE_ACTIVE_TICKS: 127 + case XE_PMU_EVENT_ENGINE_TOTAL_TICKS: 128 + return read_engine_events(gt, event); 266 129 } 267 130 268 131 return 0; ··· 350 207 xe_pmu_event_stop(event, PERF_EF_UPDATE); 351 208 } 352 209 353 - PMU_FORMAT_ATTR(gt, "config:60-63"); 354 - PMU_FORMAT_ATTR(event, "config:0-11"); 210 + PMU_FORMAT_ATTR(gt, "config:60-63"); 211 + PMU_FORMAT_ATTR(engine_class, "config:20-27"); 212 + PMU_FORMAT_ATTR(engine_instance, "config:12-19"); 213 + PMU_FORMAT_ATTR(event, "config:0-11"); 355 214 356 215 static struct attribute *pmu_format_attrs[] = { 357 216 &format_attr_event.attr, 217 + &format_attr_engine_class.attr, 218 + &format_attr_engine_instance.attr, 358 219 &format_attr_gt.attr, 359 220 NULL, 360 221 }; ··· 417 270 XE_EVENT_ATTR_GROUP(v_, id_, &pmu_event_ ##v_.attr.attr) 418 271 419 272 XE_EVENT_ATTR_SIMPLE(gt-c6-residency, gt_c6_residency, XE_PMU_EVENT_GT_C6_RESIDENCY, "ms"); 273 + XE_EVENT_ATTR_NOUNIT(engine-active-ticks, engine_active_ticks, XE_PMU_EVENT_ENGINE_ACTIVE_TICKS); 274 + XE_EVENT_ATTR_NOUNIT(engine-total-ticks, engine_total_ticks, XE_PMU_EVENT_ENGINE_TOTAL_TICKS); 420 275 421 276 static struct attribute *pmu_empty_event_attrs[] = { 422 277 /* Empty - all events are added as groups with .attr_update() */ ··· 432 283 433 284 static const struct attribute_group *pmu_events_attr_update[] = { 434 285 &pmu_group_gt_c6_residency, 286 + &pmu_group_engine_active_ticks, 287 + &pmu_group_engine_total_ticks, 435 288 NULL, 436 289 }; 437 290 438 291 static void set_supported_events(struct xe_pmu *pmu) 439 292 { 440 293 struct xe_device *xe = container_of(pmu, typeof(*xe), pmu); 294 + struct xe_gt *gt = xe_device_get_gt(xe, 0); 441 295 442 296 if (!xe->info.skip_guc_pc) 443 297 pmu->supported_events |= BIT_ULL(XE_PMU_EVENT_GT_C6_RESIDENCY); 298 + 299 + if (xe_guc_engine_activity_supported(&gt->uc.guc)) { 300 + pmu->supported_events |= BIT_ULL(XE_PMU_EVENT_ENGINE_ACTIVE_TICKS); 301 + pmu->supported_events |= BIT_ULL(XE_PMU_EVENT_ENGINE_TOTAL_TICKS); 302 + } 444 303 } 445 304 446 305 /**

+397 -98

drivers/gpu/drm/xe/xe_pt.c

··· 20 20 #include "xe_res_cursor.h" 21 21 #include "xe_sched_job.h" 22 22 #include "xe_sync.h" 23 + #include "xe_svm.h" 23 24 #include "xe_trace.h" 24 25 #include "xe_ttm_stolen_mgr.h" 25 26 #include "xe_vm.h" ··· 29 28 struct xe_pt pt; 30 29 /** @children: Array of page-table child nodes */ 31 30 struct xe_ptw *children[XE_PDES]; 31 + /** @staging: Array of page-table staging nodes */ 32 + struct xe_ptw *staging[XE_PDES]; 32 33 }; 33 34 34 35 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM) ··· 51 48 return container_of(pt, struct xe_pt_dir, pt); 52 49 } 53 50 54 - static struct xe_pt *xe_pt_entry(struct xe_pt_dir *pt_dir, unsigned int index) 51 + static struct xe_pt * 52 + xe_pt_entry_staging(struct xe_pt_dir *pt_dir, unsigned int index) 55 53 { 56 - return container_of(pt_dir->children[index], struct xe_pt, base); 54 + return container_of(pt_dir->staging[index], struct xe_pt, base); 57 55 } 58 56 59 57 static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm, ··· 129 125 } 130 126 pt->bo = bo; 131 127 pt->base.children = level ? as_xe_pt_dir(pt)->children : NULL; 128 + pt->base.staging = level ? as_xe_pt_dir(pt)->staging : NULL; 132 129 133 130 if (vm->xef) 134 131 xe_drm_client_add_bo(vm->xef->client, pt->bo); ··· 211 206 struct xe_pt_dir *pt_dir = as_xe_pt_dir(pt); 212 207 213 208 for (i = 0; i < XE_PDES; i++) { 214 - if (xe_pt_entry(pt_dir, i)) 215 - xe_pt_destroy(xe_pt_entry(pt_dir, i), flags, 209 + if (xe_pt_entry_staging(pt_dir, i)) 210 + xe_pt_destroy(xe_pt_entry_staging(pt_dir, i), flags, 216 211 deferred); 217 212 } 218 213 } 219 214 xe_pt_free(pt); 215 + } 216 + 217 + /** 218 + * xe_pt_clear() - Clear a page-table. 219 + * @xe: xe device. 220 + * @pt: The page-table. 221 + * 222 + * Clears page-table by setting to zero. 223 + */ 224 + void xe_pt_clear(struct xe_device *xe, struct xe_pt *pt) 225 + { 226 + struct iosys_map *map = &pt->bo->vmap; 227 + 228 + xe_map_memset(xe, map, 0, 0, SZ_4K); 220 229 } 221 230 222 231 /** ··· 395 376 /* Continue building a non-connected subtree. */ 396 377 struct iosys_map *map = &parent->bo->vmap; 397 378 398 - if (unlikely(xe_child)) 379 + if (unlikely(xe_child)) { 399 380 parent->base.children[offset] = &xe_child->base; 381 + parent->base.staging[offset] = &xe_child->base; 382 + } 400 383 401 384 xe_pt_write(xe_walk->vm->xe, map, offset, pte); 402 385 parent->num_live++; ··· 608 587 * range. 609 588 * @tile: The tile we're building for. 610 589 * @vma: The vma indicating the address range. 590 + * @range: The range indicating the address range. 611 591 * @entries: Storage for the update entries used for connecting the tree to 612 592 * the main tree at commit time. 613 593 * @num_entries: On output contains the number of @entries used. ··· 624 602 */ 625 603 static int 626 604 xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma, 605 + struct xe_svm_range *range, 627 606 struct xe_vm_pgtable_update *entries, u32 *num_entries) 628 607 { 629 608 struct xe_device *xe = tile_to_xe(tile); ··· 637 614 .ops = &xe_pt_stage_bind_ops, 638 615 .shifts = xe_normal_pt_shifts, 639 616 .max_level = XE_PT_HIGHEST_LEVEL, 617 + .staging = true, 640 618 }, 641 619 .vm = xe_vma_vm(vma), 642 620 .tile = tile, 643 621 .curs = &curs, 644 - .va_curs_start = xe_vma_start(vma), 622 + .va_curs_start = range ? range->base.itree.start : 623 + xe_vma_start(vma), 645 624 .vma = vma, 646 625 .wupd.entries = entries, 647 - .needs_64K = (xe_vma_vm(vma)->flags & XE_VM_FLAG_64K) && is_devmem, 648 626 }; 649 627 struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; 650 628 int ret; 629 + 630 + if (range) { 631 + /* Move this entire thing to xe_svm.c? */ 632 + xe_svm_notifier_lock(xe_vma_vm(vma)); 633 + if (!xe_svm_range_pages_valid(range)) { 634 + xe_svm_range_debug(range, "BIND PREPARE - RETRY"); 635 + xe_svm_notifier_unlock(xe_vma_vm(vma)); 636 + return -EAGAIN; 637 + } 638 + if (xe_svm_range_has_dma_mapping(range)) { 639 + xe_res_first_dma(range->base.dma_addr, 0, 640 + range->base.itree.last + 1 - range->base.itree.start, 641 + &curs); 642 + is_devmem = xe_res_is_vram(&curs); 643 + if (is_devmem) 644 + xe_svm_range_debug(range, "BIND PREPARE - DMA VRAM"); 645 + else 646 + xe_svm_range_debug(range, "BIND PREPARE - DMA"); 647 + } else { 648 + xe_assert(xe, false); 649 + } 650 + /* 651 + * Note, when unlocking the resource cursor dma addresses may become 652 + * stale, but the bind will be aborted anyway at commit time. 653 + */ 654 + xe_svm_notifier_unlock(xe_vma_vm(vma)); 655 + } 656 + 657 + xe_walk.needs_64K = (xe_vma_vm(vma)->flags & XE_VM_FLAG_64K) && is_devmem; 651 658 652 659 /** 653 660 * Default atomic expectations for different allocation scenarios are as follows: ··· 700 647 * gets migrated to LMEM, bind such allocations with 701 648 * device atomics enabled. 702 649 */ 703 - else if (is_devmem && !xe_bo_has_single_placement(bo)) 650 + else if (is_devmem) 704 651 xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE; 705 652 } else { 706 653 xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE; ··· 716 663 717 664 if (is_devmem) { 718 665 xe_walk.default_pte |= XE_PPGTT_PTE_DM; 719 - xe_walk.dma_offset = vram_region_gpu_offset(bo->ttm.resource); 666 + xe_walk.dma_offset = bo ? vram_region_gpu_offset(bo->ttm.resource) : 0; 720 667 } 721 668 722 669 if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo)) 723 670 xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo)); 724 671 725 - xe_bo_assert_held(bo); 672 + if (!range) 673 + xe_bo_assert_held(bo); 726 674 727 - if (!xe_vma_is_null(vma)) { 675 + if (!xe_vma_is_null(vma) && !range) { 728 676 if (xe_vma_is_userptr(vma)) 729 677 xe_res_first_sg(to_userptr_vma(vma)->userptr.sg, 0, 730 678 xe_vma_size(vma), &curs); ··· 735 681 else 736 682 xe_res_first_sg(xe_bo_sg(bo), xe_vma_bo_offset(vma), 737 683 xe_vma_size(vma), &curs); 738 - } else { 684 + } else if (!range) { 739 685 curs.size = xe_vma_size(vma); 740 686 } 741 687 742 - ret = xe_pt_walk_range(&pt->base, pt->level, xe_vma_start(vma), 743 - xe_vma_end(vma), &xe_walk.base); 688 + ret = xe_pt_walk_range(&pt->base, pt->level, 689 + range ? range->base.itree.start : xe_vma_start(vma), 690 + range ? range->base.itree.last + 1 : xe_vma_end(vma), 691 + &xe_walk.base); 744 692 745 693 *num_entries = xe_walk.wupd.num_used_entries; 746 694 return ret; ··· 886 830 return xe_walk.needs_invalidate; 887 831 } 888 832 833 + /** 834 + * xe_pt_zap_ptes_range() - Zap (zero) gpu ptes of a SVM range 835 + * @tile: The tile we're zapping for. 836 + * @vm: The VM we're zapping for. 837 + * @range: The SVM range we're zapping for. 838 + * 839 + * SVM invalidation needs to be able to zap the gpu ptes of a given address 840 + * range. In order to be able to do that, that function needs access to the 841 + * shared page-table entries so it can either clear the leaf PTEs or 842 + * clear the pointers to lower-level page-tables. The caller is required 843 + * to hold the SVM notifier lock. 844 + * 845 + * Return: Whether ptes were actually updated and a TLB invalidation is 846 + * required. 847 + */ 848 + bool xe_pt_zap_ptes_range(struct xe_tile *tile, struct xe_vm *vm, 849 + struct xe_svm_range *range) 850 + { 851 + struct xe_pt_zap_ptes_walk xe_walk = { 852 + .base = { 853 + .ops = &xe_pt_zap_ptes_ops, 854 + .shifts = xe_normal_pt_shifts, 855 + .max_level = XE_PT_HIGHEST_LEVEL, 856 + }, 857 + .tile = tile, 858 + }; 859 + struct xe_pt *pt = vm->pt_root[tile->id]; 860 + u8 pt_mask = (range->tile_present & ~range->tile_invalidated); 861 + 862 + xe_svm_assert_in_notifier(vm); 863 + 864 + if (!(pt_mask & BIT(tile->id))) 865 + return false; 866 + 867 + (void)xe_pt_walk_shared(&pt->base, pt->level, range->base.itree.start, 868 + range->base.itree.last + 1, &xe_walk.base); 869 + 870 + return xe_walk.needs_invalidate; 871 + } 872 + 889 873 static void 890 874 xe_vm_populate_pgtable(struct xe_migrate_pt_update *pt_update, struct xe_tile *tile, 891 875 struct iosys_map *map, void *data, ··· 969 873 } 970 874 } 971 875 972 - static void xe_pt_commit_locks_assert(struct xe_vma *vma) 973 - { 974 - struct xe_vm *vm = xe_vma_vm(vma); 876 + #define XE_INVALID_VMA ((struct xe_vma *)(0xdeaddeadull)) 975 877 878 + static void xe_pt_commit_prepare_locks_assert(struct xe_vma *vma) 879 + { 880 + struct xe_vm *vm; 881 + 882 + if (vma == XE_INVALID_VMA) 883 + return; 884 + 885 + vm = xe_vma_vm(vma); 976 886 lockdep_assert_held(&vm->lock); 977 887 978 - if (!xe_vma_is_userptr(vma) && !xe_vma_is_null(vma)) 888 + if (!xe_vma_has_no_bo(vma)) 979 889 dma_resv_assert_held(xe_vma_bo(vma)->ttm.base.resv); 980 890 981 891 xe_vm_assert_held(vm); 892 + } 893 + 894 + static void xe_pt_commit_locks_assert(struct xe_vma *vma) 895 + { 896 + struct xe_vm *vm; 897 + 898 + if (vma == XE_INVALID_VMA) 899 + return; 900 + 901 + vm = xe_vma_vm(vma); 902 + xe_pt_commit_prepare_locks_assert(vma); 903 + 904 + if (xe_vma_is_userptr(vma)) 905 + lockdep_assert_held_read(&vm->userptr.notifier_lock); 982 906 } 983 907 984 908 static void xe_pt_commit(struct xe_vma *vma, ··· 1011 895 1012 896 for (i = 0; i < num_entries; i++) { 1013 897 struct xe_pt *pt = entries[i].pt; 898 + struct xe_pt_dir *pt_dir; 1014 899 1015 900 if (!pt->level) 1016 901 continue; 1017 902 903 + pt_dir = as_xe_pt_dir(pt); 1018 904 for (j = 0; j < entries[i].qwords; j++) { 1019 905 struct xe_pt *oldpte = entries[i].pt_entries[j].pt; 906 + int j_ = j + entries[i].ofs; 1020 907 1021 - xe_pt_destroy(oldpte, xe_vma_vm(vma)->flags, deferred); 908 + pt_dir->children[j_] = pt_dir->staging[j_]; 909 + xe_pt_destroy(oldpte, (vma == XE_INVALID_VMA) ? 0 : 910 + xe_vma_vm(vma)->flags, deferred); 1022 911 } 1023 912 } 1024 913 } ··· 1034 913 { 1035 914 int i, j; 1036 915 1037 - xe_pt_commit_locks_assert(vma); 916 + xe_pt_commit_prepare_locks_assert(vma); 1038 917 1039 918 for (i = num_entries - 1; i >= 0; --i) { 1040 919 struct xe_pt *pt = entries[i].pt; ··· 1049 928 pt_dir = as_xe_pt_dir(pt); 1050 929 for (j = 0; j < entries[i].qwords; j++) { 1051 930 u32 j_ = j + entries[i].ofs; 1052 - struct xe_pt *newpte = xe_pt_entry(pt_dir, j_); 931 + struct xe_pt *newpte = xe_pt_entry_staging(pt_dir, j_); 1053 932 struct xe_pt *oldpte = entries[i].pt_entries[j].pt; 1054 933 1055 - pt_dir->children[j_] = oldpte ? &oldpte->base : 0; 934 + pt_dir->staging[j_] = oldpte ? &oldpte->base : 0; 1056 935 xe_pt_destroy(newpte, xe_vma_vm(vma)->flags, NULL); 1057 936 } 1058 937 } ··· 1064 943 { 1065 944 u32 i, j; 1066 945 1067 - xe_pt_commit_locks_assert(vma); 946 + xe_pt_commit_prepare_locks_assert(vma); 1068 947 1069 948 for (i = 0; i < num_entries; i++) { 1070 949 struct xe_pt *pt = entries[i].pt; ··· 1082 961 struct xe_pt *newpte = entries[i].pt_entries[j].pt; 1083 962 struct xe_pt *oldpte = NULL; 1084 963 1085 - if (xe_pt_entry(pt_dir, j_)) 1086 - oldpte = xe_pt_entry(pt_dir, j_); 964 + if (xe_pt_entry_staging(pt_dir, j_)) 965 + oldpte = xe_pt_entry_staging(pt_dir, j_); 1087 966 1088 - pt_dir->children[j_] = &newpte->base; 967 + pt_dir->staging[j_] = &newpte->base; 1089 968 entries[i].pt_entries[j].pt = oldpte; 1090 969 } 1091 970 } ··· 1102 981 1103 982 static int 1104 983 xe_pt_prepare_bind(struct xe_tile *tile, struct xe_vma *vma, 984 + struct xe_svm_range *range, 1105 985 struct xe_vm_pgtable_update *entries, u32 *num_entries) 1106 986 { 1107 987 int err; 1108 988 1109 989 *num_entries = 0; 1110 - err = xe_pt_stage_bind(tile, vma, entries, num_entries); 990 + err = xe_pt_stage_bind(tile, vma, range, entries, num_entries); 1111 991 if (!err) 1112 992 xe_tile_assert(tile, *num_entries); 1113 993 ··· 1191 1069 { 1192 1070 int err = 0; 1193 1071 1072 + /* 1073 + * No need to check for is_cpu_addr_mirror here as vma_add_deps is a 1074 + * NOP if VMA is_cpu_addr_mirror 1075 + */ 1076 + 1194 1077 switch (op->base.op) { 1195 1078 case DRM_GPUVA_OP_MAP: 1196 1079 if (!op->map.immediate && xe_vm_in_fault_mode(vm)) ··· 1213 1086 break; 1214 1087 case DRM_GPUVA_OP_PREFETCH: 1215 1088 err = vma_add_deps(gpuva_to_vma(op->base.prefetch.va), job); 1089 + break; 1090 + case DRM_GPUVA_OP_DRIVER: 1216 1091 break; 1217 1092 default: 1218 1093 drm_warn(&vm->xe->drm, "NOT POSSIBLE"); ··· 1342 1213 return 0; 1343 1214 1344 1215 uvma = to_userptr_vma(vma); 1216 + if (xe_pt_userptr_inject_eagain(uvma)) 1217 + xe_vma_userptr_force_invalidate(uvma); 1218 + 1345 1219 notifier_seq = uvma->userptr.notifier_seq; 1346 1220 1347 - if (uvma->userptr.initial_bind && !xe_vm_in_fault_mode(vm)) 1348 - return 0; 1349 - 1350 1221 if (!mmu_interval_read_retry(&uvma->userptr.notifier, 1351 - notifier_seq) && 1352 - !xe_pt_userptr_inject_eagain(uvma)) 1222 + notifier_seq)) 1353 1223 return 0; 1354 1224 1355 - if (xe_vm_in_fault_mode(vm)) { 1225 + if (xe_vm_in_fault_mode(vm)) 1356 1226 return -EAGAIN; 1357 - } else { 1358 - spin_lock(&vm->userptr.invalidated_lock); 1359 - list_move_tail(&uvma->userptr.invalidate_link, 1360 - &vm->userptr.invalidated); 1361 - spin_unlock(&vm->userptr.invalidated_lock); 1362 1227 1363 - if (xe_vm_in_preempt_fence_mode(vm)) { 1364 - struct dma_resv_iter cursor; 1365 - struct dma_fence *fence; 1366 - long err; 1367 - 1368 - dma_resv_iter_begin(&cursor, xe_vm_resv(vm), 1369 - DMA_RESV_USAGE_BOOKKEEP); 1370 - dma_resv_for_each_fence_unlocked(&cursor, fence) 1371 - dma_fence_enable_sw_signaling(fence); 1372 - dma_resv_iter_end(&cursor); 1373 - 1374 - err = dma_resv_wait_timeout(xe_vm_resv(vm), 1375 - DMA_RESV_USAGE_BOOKKEEP, 1376 - false, MAX_SCHEDULE_TIMEOUT); 1377 - XE_WARN_ON(err <= 0); 1378 - } 1379 - } 1380 - 1228 + /* 1229 + * Just continue the operation since exec or rebind worker 1230 + * will take care of rebinding. 1231 + */ 1381 1232 return 0; 1382 1233 } 1383 1234 ··· 1418 1309 } 1419 1310 1420 1311 return err; 1312 + } 1313 + 1314 + static int xe_pt_svm_pre_commit(struct xe_migrate_pt_update *pt_update) 1315 + { 1316 + struct xe_vm *vm = pt_update->vops->vm; 1317 + struct xe_vma_ops *vops = pt_update->vops; 1318 + struct xe_vma_op *op; 1319 + int err; 1320 + 1321 + err = xe_pt_pre_commit(pt_update); 1322 + if (err) 1323 + return err; 1324 + 1325 + xe_svm_notifier_lock(vm); 1326 + 1327 + list_for_each_entry(op, &vops->list, link) { 1328 + struct xe_svm_range *range = op->map_range.range; 1329 + 1330 + if (op->subop == XE_VMA_SUBOP_UNMAP_RANGE) 1331 + continue; 1332 + 1333 + xe_svm_range_debug(range, "PRE-COMMIT"); 1334 + 1335 + xe_assert(vm->xe, xe_vma_is_cpu_addr_mirror(op->map_range.vma)); 1336 + xe_assert(vm->xe, op->subop == XE_VMA_SUBOP_MAP_RANGE); 1337 + 1338 + if (!xe_svm_range_pages_valid(range)) { 1339 + xe_svm_range_debug(range, "PRE-COMMIT - RETRY"); 1340 + xe_svm_notifier_unlock(vm); 1341 + return -EAGAIN; 1342 + } 1343 + } 1344 + 1345 + return 0; 1421 1346 } 1422 1347 1423 1348 struct invalidation_fence { ··· 1639 1496 * xe_pt_stage_unbind() - Build page-table update structures for an unbind 1640 1497 * operation 1641 1498 * @tile: The tile we're unbinding for. 1499 + * @vm: The vm 1642 1500 * @vma: The vma we're unbinding. 1501 + * @range: The range we're unbinding. 1643 1502 * @entries: Caller-provided storage for the update structures. 1644 1503 * 1645 1504 * Builds page-table update structures for an unbind operation. The function ··· 1651 1506 * 1652 1507 * Return: The number of entries used. 1653 1508 */ 1654 - static unsigned int xe_pt_stage_unbind(struct xe_tile *tile, struct xe_vma *vma, 1509 + static unsigned int xe_pt_stage_unbind(struct xe_tile *tile, 1510 + struct xe_vm *vm, 1511 + struct xe_vma *vma, 1512 + struct xe_svm_range *range, 1655 1513 struct xe_vm_pgtable_update *entries) 1656 1514 { 1515 + u64 start = range ? range->base.itree.start : xe_vma_start(vma); 1516 + u64 end = range ? range->base.itree.last + 1 : xe_vma_end(vma); 1657 1517 struct xe_pt_stage_unbind_walk xe_walk = { 1658 1518 .base = { 1659 1519 .ops = &xe_pt_stage_unbind_ops, 1660 1520 .shifts = xe_normal_pt_shifts, 1661 1521 .max_level = XE_PT_HIGHEST_LEVEL, 1522 + .staging = true, 1662 1523 }, 1663 1524 .tile = tile, 1664 - .modified_start = xe_vma_start(vma), 1665 - .modified_end = xe_vma_end(vma), 1525 + .modified_start = start, 1526 + .modified_end = end, 1666 1527 .wupd.entries = entries, 1667 1528 }; 1668 - struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id]; 1529 + struct xe_pt *pt = vm->pt_root[tile->id]; 1669 1530 1670 - (void)xe_pt_walk_shared(&pt->base, pt->level, xe_vma_start(vma), 1671 - xe_vma_end(vma), &xe_walk.base); 1531 + (void)xe_pt_walk_shared(&pt->base, pt->level, start, end, 1532 + &xe_walk.base); 1672 1533 1673 1534 return xe_walk.wupd.num_used_entries; 1674 1535 } ··· 1706 1555 { 1707 1556 int i, j; 1708 1557 1709 - xe_pt_commit_locks_assert(vma); 1558 + xe_pt_commit_prepare_locks_assert(vma); 1710 1559 1711 1560 for (i = num_entries - 1; i >= 0; --i) { 1712 1561 struct xe_vm_pgtable_update *entry = &entries[i]; ··· 1719 1568 continue; 1720 1569 1721 1570 for (j = entry->ofs; j < entry->ofs + entry->qwords; j++) 1722 - pt_dir->children[j] = 1571 + pt_dir->staging[j] = 1723 1572 entries[i].pt_entries[j - entry->ofs].pt ? 1724 1573 &entries[i].pt_entries[j - entry->ofs].pt->base : NULL; 1725 1574 } ··· 1732 1581 { 1733 1582 int i, j; 1734 1583 1735 - xe_pt_commit_locks_assert(vma); 1584 + xe_pt_commit_prepare_locks_assert(vma); 1736 1585 1737 1586 for (i = 0; i < num_entries; ++i) { 1738 1587 struct xe_vm_pgtable_update *entry = &entries[i]; ··· 1746 1595 pt_dir = as_xe_pt_dir(pt); 1747 1596 for (j = entry->ofs; j < entry->ofs + entry->qwords; j++) { 1748 1597 entry->pt_entries[j - entry->ofs].pt = 1749 - xe_pt_entry(pt_dir, j); 1750 - pt_dir->children[j] = NULL; 1598 + xe_pt_entry_staging(pt_dir, j); 1599 + pt_dir->staging[j] = NULL; 1751 1600 } 1752 1601 } 1753 1602 } 1754 1603 1755 1604 static void 1756 1605 xe_pt_update_ops_rfence_interval(struct xe_vm_pgtable_update_ops *pt_update_ops, 1757 - struct xe_vma *vma) 1606 + u64 start, u64 end) 1758 1607 { 1608 + u64 last; 1759 1609 u32 current_op = pt_update_ops->current_op; 1760 1610 struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op]; 1761 1611 int i, level = 0; 1762 - u64 start, last; 1763 1612 1764 1613 for (i = 0; i < pt_op->num_entries; i++) { 1765 1614 const struct xe_vm_pgtable_update *entry = &pt_op->entries[i]; ··· 1769 1618 } 1770 1619 1771 1620 /* Greedy (non-optimal) calculation but simple */ 1772 - start = ALIGN_DOWN(xe_vma_start(vma), 0x1ull << xe_pt_shift(level)); 1773 - last = ALIGN(xe_vma_end(vma), 0x1ull << xe_pt_shift(level)) - 1; 1621 + start = ALIGN_DOWN(start, 0x1ull << xe_pt_shift(level)); 1622 + last = ALIGN(end, 0x1ull << xe_pt_shift(level)) - 1; 1774 1623 1775 1624 if (start < pt_update_ops->start) 1776 1625 pt_update_ops->start = start; ··· 1797 1646 struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op]; 1798 1647 int err; 1799 1648 1649 + xe_tile_assert(tile, !xe_vma_is_cpu_addr_mirror(vma)); 1800 1650 xe_bo_assert_held(xe_vma_bo(vma)); 1801 1651 1802 1652 vm_dbg(&xe_vma_vm(vma)->xe->drm, ··· 1812 1660 if (err) 1813 1661 return err; 1814 1662 1815 - err = xe_pt_prepare_bind(tile, vma, pt_op->entries, 1663 + err = xe_pt_prepare_bind(tile, vma, NULL, pt_op->entries, 1816 1664 &pt_op->num_entries); 1817 1665 if (!err) { 1818 1666 xe_tile_assert(tile, pt_op->num_entries <= ··· 1820 1668 xe_vm_dbg_print_entries(tile_to_xe(tile), pt_op->entries, 1821 1669 pt_op->num_entries, true); 1822 1670 1823 - xe_pt_update_ops_rfence_interval(pt_update_ops, vma); 1671 + xe_pt_update_ops_rfence_interval(pt_update_ops, 1672 + xe_vma_start(vma), 1673 + xe_vma_end(vma)); 1824 1674 ++pt_update_ops->current_op; 1825 1675 pt_update_ops->needs_userptr_lock |= xe_vma_is_userptr(vma); 1826 1676 ··· 1856 1702 return err; 1857 1703 } 1858 1704 1705 + static int bind_range_prepare(struct xe_vm *vm, struct xe_tile *tile, 1706 + struct xe_vm_pgtable_update_ops *pt_update_ops, 1707 + struct xe_vma *vma, struct xe_svm_range *range) 1708 + { 1709 + u32 current_op = pt_update_ops->current_op; 1710 + struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op]; 1711 + int err; 1712 + 1713 + xe_tile_assert(tile, xe_vma_is_cpu_addr_mirror(vma)); 1714 + 1715 + vm_dbg(&xe_vma_vm(vma)->xe->drm, 1716 + "Preparing bind, with range [%lx...%lx)\n", 1717 + range->base.itree.start, range->base.itree.last); 1718 + 1719 + pt_op->vma = NULL; 1720 + pt_op->bind = true; 1721 + pt_op->rebind = BIT(tile->id) & range->tile_present; 1722 + 1723 + err = xe_pt_prepare_bind(tile, vma, range, pt_op->entries, 1724 + &pt_op->num_entries); 1725 + if (!err) { 1726 + xe_tile_assert(tile, pt_op->num_entries <= 1727 + ARRAY_SIZE(pt_op->entries)); 1728 + xe_vm_dbg_print_entries(tile_to_xe(tile), pt_op->entries, 1729 + pt_op->num_entries, true); 1730 + 1731 + xe_pt_update_ops_rfence_interval(pt_update_ops, 1732 + range->base.itree.start, 1733 + range->base.itree.last + 1); 1734 + ++pt_update_ops->current_op; 1735 + pt_update_ops->needs_svm_lock = true; 1736 + 1737 + pt_op->vma = vma; 1738 + xe_pt_commit_prepare_bind(vma, pt_op->entries, 1739 + pt_op->num_entries, pt_op->rebind); 1740 + } else { 1741 + xe_pt_cancel_bind(vma, pt_op->entries, pt_op->num_entries); 1742 + } 1743 + 1744 + return err; 1745 + } 1746 + 1859 1747 static int unbind_op_prepare(struct xe_tile *tile, 1860 1748 struct xe_vm_pgtable_update_ops *pt_update_ops, 1861 1749 struct xe_vma *vma) ··· 1909 1713 if (!((vma->tile_present | vma->tile_staged) & BIT(tile->id))) 1910 1714 return 0; 1911 1715 1716 + xe_tile_assert(tile, !xe_vma_is_cpu_addr_mirror(vma)); 1912 1717 xe_bo_assert_held(xe_vma_bo(vma)); 1913 1718 1914 1719 vm_dbg(&xe_vma_vm(vma)->xe->drm, 1915 1720 "Preparing unbind, with range [%llx...%llx)\n", 1916 1721 xe_vma_start(vma), xe_vma_end(vma) - 1); 1917 - 1918 - /* 1919 - * Wait for invalidation to complete. Can corrupt internal page table 1920 - * state if an invalidation is running while preparing an unbind. 1921 - */ 1922 - if (xe_vma_is_userptr(vma) && xe_vm_in_fault_mode(xe_vma_vm(vma))) 1923 - mmu_interval_read_begin(&to_userptr_vma(vma)->userptr.notifier); 1924 1722 1925 1723 pt_op->vma = vma; 1926 1724 pt_op->bind = false; ··· 1924 1734 if (err) 1925 1735 return err; 1926 1736 1927 - pt_op->num_entries = xe_pt_stage_unbind(tile, vma, pt_op->entries); 1737 + pt_op->num_entries = xe_pt_stage_unbind(tile, xe_vma_vm(vma), 1738 + vma, NULL, pt_op->entries); 1928 1739 1929 1740 xe_vm_dbg_print_entries(tile_to_xe(tile), pt_op->entries, 1930 1741 pt_op->num_entries, false); 1931 - xe_pt_update_ops_rfence_interval(pt_update_ops, vma); 1742 + xe_pt_update_ops_rfence_interval(pt_update_ops, xe_vma_start(vma), 1743 + xe_vma_end(vma)); 1932 1744 ++pt_update_ops->current_op; 1933 1745 pt_update_ops->needs_userptr_lock |= xe_vma_is_userptr(vma); 1934 1746 pt_update_ops->needs_invalidation = true; 1935 1747 1936 1748 xe_pt_commit_prepare_unbind(vma, pt_op->entries, pt_op->num_entries); 1749 + 1750 + return 0; 1751 + } 1752 + 1753 + static int unbind_range_prepare(struct xe_vm *vm, 1754 + struct xe_tile *tile, 1755 + struct xe_vm_pgtable_update_ops *pt_update_ops, 1756 + struct xe_svm_range *range) 1757 + { 1758 + u32 current_op = pt_update_ops->current_op; 1759 + struct xe_vm_pgtable_update_op *pt_op = &pt_update_ops->ops[current_op]; 1760 + 1761 + if (!(range->tile_present & BIT(tile->id))) 1762 + return 0; 1763 + 1764 + vm_dbg(&vm->xe->drm, 1765 + "Preparing unbind, with range [%lx...%lx)\n", 1766 + range->base.itree.start, range->base.itree.last); 1767 + 1768 + pt_op->vma = XE_INVALID_VMA; 1769 + pt_op->bind = false; 1770 + pt_op->rebind = false; 1771 + 1772 + pt_op->num_entries = xe_pt_stage_unbind(tile, vm, NULL, range, 1773 + pt_op->entries); 1774 + 1775 + xe_vm_dbg_print_entries(tile_to_xe(tile), pt_op->entries, 1776 + pt_op->num_entries, false); 1777 + xe_pt_update_ops_rfence_interval(pt_update_ops, range->base.itree.start, 1778 + range->base.itree.last + 1); 1779 + ++pt_update_ops->current_op; 1780 + pt_update_ops->needs_svm_lock = true; 1781 + pt_update_ops->needs_invalidation = true; 1782 + 1783 + xe_pt_commit_prepare_unbind(XE_INVALID_VMA, pt_op->entries, 1784 + pt_op->num_entries); 1937 1785 1938 1786 return 0; 1939 1787 } ··· 1987 1759 1988 1760 switch (op->base.op) { 1989 1761 case DRM_GPUVA_OP_MAP: 1990 - if (!op->map.immediate && xe_vm_in_fault_mode(vm)) 1762 + if ((!op->map.immediate && xe_vm_in_fault_mode(vm)) || 1763 + op->map.is_cpu_addr_mirror) 1991 1764 break; 1992 1765 1993 1766 err = bind_op_prepare(vm, tile, pt_update_ops, op->map.vma); 1994 1767 pt_update_ops->wait_vm_kernel = true; 1995 1768 break; 1996 1769 case DRM_GPUVA_OP_REMAP: 1997 - err = unbind_op_prepare(tile, pt_update_ops, 1998 - gpuva_to_vma(op->base.remap.unmap->va)); 1770 + { 1771 + struct xe_vma *old = gpuva_to_vma(op->base.remap.unmap->va); 1772 + 1773 + if (xe_vma_is_cpu_addr_mirror(old)) 1774 + break; 1775 + 1776 + err = unbind_op_prepare(tile, pt_update_ops, old); 1999 1777 2000 1778 if (!err && op->remap.prev) { 2001 1779 err = bind_op_prepare(vm, tile, pt_update_ops, ··· 2014 1780 pt_update_ops->wait_vm_bookkeep = true; 2015 1781 } 2016 1782 break; 1783 + } 2017 1784 case DRM_GPUVA_OP_UNMAP: 2018 - err = unbind_op_prepare(tile, pt_update_ops, 2019 - gpuva_to_vma(op->base.unmap.va)); 1785 + { 1786 + struct xe_vma *vma = gpuva_to_vma(op->base.unmap.va); 1787 + 1788 + if (xe_vma_is_cpu_addr_mirror(vma)) 1789 + break; 1790 + 1791 + err = unbind_op_prepare(tile, pt_update_ops, vma); 2020 1792 break; 1793 + } 2021 1794 case DRM_GPUVA_OP_PREFETCH: 2022 - err = bind_op_prepare(vm, tile, pt_update_ops, 2023 - gpuva_to_vma(op->base.prefetch.va)); 1795 + { 1796 + struct xe_vma *vma = gpuva_to_vma(op->base.prefetch.va); 1797 + 1798 + if (xe_vma_is_cpu_addr_mirror(vma)) 1799 + break; 1800 + 1801 + err = bind_op_prepare(vm, tile, pt_update_ops, vma); 2024 1802 pt_update_ops->wait_vm_kernel = true; 1803 + break; 1804 + } 1805 + case DRM_GPUVA_OP_DRIVER: 1806 + if (op->subop == XE_VMA_SUBOP_MAP_RANGE) { 1807 + xe_assert(vm->xe, xe_vma_is_cpu_addr_mirror(op->map_range.vma)); 1808 + 1809 + err = bind_range_prepare(vm, tile, pt_update_ops, 1810 + op->map_range.vma, 1811 + op->map_range.range); 1812 + } else if (op->subop == XE_VMA_SUBOP_UNMAP_RANGE) { 1813 + err = unbind_range_prepare(vm, tile, pt_update_ops, 1814 + op->unmap_range.range); 1815 + } 2025 1816 break; 2026 1817 default: 2027 1818 drm_warn(&vm->xe->drm, "NOT POSSIBLE"); ··· 2117 1858 struct xe_vma *vma, struct dma_fence *fence, 2118 1859 struct dma_fence *fence2) 2119 1860 { 1861 + xe_tile_assert(tile, !xe_vma_is_cpu_addr_mirror(vma)); 1862 + 2120 1863 if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) { 2121 1864 dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence, 2122 1865 pt_update_ops->wait_vm_bookkeep ? ··· 2152 1891 struct xe_vma *vma, struct dma_fence *fence, 2153 1892 struct dma_fence *fence2) 2154 1893 { 1894 + xe_tile_assert(tile, !xe_vma_is_cpu_addr_mirror(vma)); 1895 + 2155 1896 if (!xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) { 2156 1897 dma_resv_add_fence(xe_vma_bo(vma)->ttm.base.resv, fence, 2157 1898 pt_update_ops->wait_vm_bookkeep ? ··· 2188 1925 2189 1926 switch (op->base.op) { 2190 1927 case DRM_GPUVA_OP_MAP: 2191 - if (!op->map.immediate && xe_vm_in_fault_mode(vm)) 1928 + if ((!op->map.immediate && xe_vm_in_fault_mode(vm)) || 1929 + op->map.is_cpu_addr_mirror) 2192 1930 break; 2193 1931 2194 1932 bind_op_commit(vm, tile, pt_update_ops, op->map.vma, fence, 2195 1933 fence2); 2196 1934 break; 2197 1935 case DRM_GPUVA_OP_REMAP: 2198 - unbind_op_commit(vm, tile, pt_update_ops, 2199 - gpuva_to_vma(op->base.remap.unmap->va), fence, 2200 - fence2); 1936 + { 1937 + struct xe_vma *old = gpuva_to_vma(op->base.remap.unmap->va); 1938 + 1939 + if (xe_vma_is_cpu_addr_mirror(old)) 1940 + break; 1941 + 1942 + unbind_op_commit(vm, tile, pt_update_ops, old, fence, fence2); 2201 1943 2202 1944 if (op->remap.prev) 2203 1945 bind_op_commit(vm, tile, pt_update_ops, op->remap.prev, ··· 2211 1943 bind_op_commit(vm, tile, pt_update_ops, op->remap.next, 2212 1944 fence, fence2); 2213 1945 break; 1946 + } 2214 1947 case DRM_GPUVA_OP_UNMAP: 2215 - unbind_op_commit(vm, tile, pt_update_ops, 2216 - gpuva_to_vma(op->base.unmap.va), fence, fence2); 1948 + { 1949 + struct xe_vma *vma = gpuva_to_vma(op->base.unmap.va); 1950 + 1951 + if (!xe_vma_is_cpu_addr_mirror(vma)) 1952 + unbind_op_commit(vm, tile, pt_update_ops, vma, fence, 1953 + fence2); 2217 1954 break; 1955 + } 2218 1956 case DRM_GPUVA_OP_PREFETCH: 2219 - bind_op_commit(vm, tile, pt_update_ops, 2220 - gpuva_to_vma(op->base.prefetch.va), fence, fence2); 1957 + { 1958 + struct xe_vma *vma = gpuva_to_vma(op->base.prefetch.va); 1959 + 1960 + if (!xe_vma_is_cpu_addr_mirror(vma)) 1961 + bind_op_commit(vm, tile, pt_update_ops, vma, fence, 1962 + fence2); 2221 1963 break; 1964 + } 1965 + case DRM_GPUVA_OP_DRIVER: 1966 + { 1967 + if (op->subop == XE_VMA_SUBOP_MAP_RANGE) { 1968 + op->map_range.range->tile_present |= BIT(tile->id); 1969 + op->map_range.range->tile_invalidated &= ~BIT(tile->id); 1970 + } else if (op->subop == XE_VMA_SUBOP_UNMAP_RANGE) { 1971 + op->unmap_range.range->tile_present &= ~BIT(tile->id); 1972 + } 1973 + break; 1974 + } 2222 1975 default: 2223 1976 drm_warn(&vm->xe->drm, "NOT POSSIBLE"); 2224 1977 } ··· 2255 1966 .populate = xe_vm_populate_pgtable, 2256 1967 .clear = xe_migrate_clear_pgtable_callback, 2257 1968 .pre_commit = xe_pt_userptr_pre_commit, 1969 + }; 1970 + 1971 + static const struct xe_migrate_pt_update_ops svm_migrate_ops = { 1972 + .populate = xe_vm_populate_pgtable, 1973 + .clear = xe_migrate_clear_pgtable_callback, 1974 + .pre_commit = xe_pt_svm_pre_commit, 2258 1975 }; 2259 1976 2260 1977 /** ··· 2288 1993 struct xe_vma_op *op; 2289 1994 int err = 0, i; 2290 1995 struct xe_migrate_pt_update update = { 2291 - .ops = pt_update_ops->needs_userptr_lock ? 1996 + .ops = pt_update_ops->needs_svm_lock ? 1997 + &svm_migrate_ops : 1998 + pt_update_ops->needs_userptr_lock ? 2292 1999 &userptr_migrate_ops : 2293 2000 &migrate_ops, 2294 2001 .vops = vops, ··· 2411 2114 &ifence->base.base, &mfence->base.base); 2412 2115 } 2413 2116 2117 + if (pt_update_ops->needs_svm_lock) 2118 + xe_svm_notifier_unlock(vm); 2414 2119 if (pt_update_ops->needs_userptr_lock) 2415 2120 up_read(&vm->userptr.notifier_lock); 2416 2121

+5

drivers/gpu/drm/xe/xe_pt.h

··· 13 13 struct xe_bo; 14 14 struct xe_device; 15 15 struct xe_exec_queue; 16 + struct xe_svm_range; 16 17 struct xe_sync_entry; 17 18 struct xe_tile; 18 19 struct xe_vm; ··· 36 35 37 36 void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head *deferred); 38 37 38 + void xe_pt_clear(struct xe_device *xe, struct xe_pt *pt); 39 + 39 40 int xe_pt_update_ops_prepare(struct xe_tile *tile, struct xe_vma_ops *vops); 40 41 struct dma_fence *xe_pt_update_ops_run(struct xe_tile *tile, 41 42 struct xe_vma_ops *vops); ··· 45 42 void xe_pt_update_ops_abort(struct xe_tile *tile, struct xe_vma_ops *vops); 46 43 47 44 bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma); 45 + bool xe_pt_zap_ptes_range(struct xe_tile *tile, struct xe_vm *vm, 46 + struct xe_svm_range *range); 48 47 49 48 #endif

+2

drivers/gpu/drm/xe/xe_pt_types.h

··· 104 104 u32 num_ops; 105 105 /** @current_op: current operations */ 106 106 u32 current_op; 107 + /** @needs_svm_lock: Needs SVM lock */ 108 + bool needs_svm_lock; 107 109 /** @needs_userptr_lock: Needs userptr lock */ 108 110 bool needs_userptr_lock; 109 111 /** @needs_invalidation: Needs invalidation */

+2 -1

drivers/gpu/drm/xe/xe_pt_walk.c

··· 74 74 u64 addr, u64 end, struct xe_pt_walk *walk) 75 75 { 76 76 pgoff_t offset = xe_pt_offset(addr, level, walk); 77 - struct xe_ptw **entries = parent->children ? parent->children : NULL; 77 + struct xe_ptw **entries = walk->staging ? (parent->staging ?: NULL) : 78 + (parent->children ?: NULL); 78 79 const struct xe_pt_walk_ops *ops = walk->ops; 79 80 enum page_walk_action action; 80 81 struct xe_ptw *child;

+4

drivers/gpu/drm/xe/xe_pt_walk.h

··· 11 11 /** 12 12 * struct xe_ptw - base class for driver pagetable subclassing. 13 13 * @children: Pointer to an array of children if any. 14 + * @staging: Pointer to an array of staging if any. 14 15 * 15 16 * Drivers could subclass this, and if it's a page-directory, typically 16 17 * embed an array of xe_ptw pointers. 17 18 */ 18 19 struct xe_ptw { 19 20 struct xe_ptw **children; 21 + struct xe_ptw **staging; 20 22 }; 21 23 22 24 /** ··· 43 41 * as shared pagetables. 44 42 */ 45 43 bool shared_pt_mode; 44 + /** @staging: Walk staging PT structure */ 45 + bool staging; 46 46 }; 47 47 48 48 /**

+50 -40

drivers/gpu/drm/xe/xe_pxp.c

··· 132 132 133 133 static void pxp_invalidate_queues(struct xe_pxp *pxp); 134 134 135 - static void pxp_invalidate_state(struct xe_pxp *pxp) 136 - { 137 - pxp_invalidate_queues(pxp); 138 - 139 - if (pxp->status == XE_PXP_ACTIVE) 140 - pxp->key_instance++; 141 - } 142 - 143 135 static int pxp_terminate_hw(struct xe_pxp *pxp) 144 136 { 145 137 struct xe_gt *gt = pxp->gt; ··· 185 193 186 194 mutex_lock(&pxp->mutex); 187 195 188 - pxp_invalidate_state(pxp); 196 + if (pxp->status == XE_PXP_ACTIVE) 197 + pxp->key_instance++; 189 198 190 199 /* 191 200 * we'll mark the status as needing termination on resume, so no need to ··· 212 219 mark_termination_in_progress(pxp); 213 220 214 221 mutex_unlock(&pxp->mutex); 222 + 223 + pxp_invalidate_queues(pxp); 215 224 216 225 ret = pxp_terminate_hw(pxp); 217 226 if (ret) { ··· 660 665 return ret; 661 666 } 662 667 668 + static void __pxp_exec_queue_remove(struct xe_pxp *pxp, struct xe_exec_queue *q, bool lock) 669 + { 670 + bool need_pm_put = false; 671 + 672 + if (!xe_pxp_is_enabled(pxp)) 673 + return; 674 + 675 + if (lock) 676 + spin_lock_irq(&pxp->queues.lock); 677 + 678 + if (!list_empty(&q->pxp.link)) { 679 + list_del_init(&q->pxp.link); 680 + need_pm_put = true; 681 + } 682 + 683 + q->pxp.type = DRM_XE_PXP_TYPE_NONE; 684 + 685 + if (lock) 686 + spin_unlock_irq(&pxp->queues.lock); 687 + 688 + if (need_pm_put) 689 + xe_pm_runtime_put(pxp->xe); 690 + } 691 + 663 692 /** 664 693 * xe_pxp_exec_queue_remove - remove a queue from the PXP list 665 694 * @pxp: the xe->pxp pointer (it will be NULL if PXP is disabled) ··· 695 676 */ 696 677 void xe_pxp_exec_queue_remove(struct xe_pxp *pxp, struct xe_exec_queue *q) 697 678 { 698 - bool need_pm_put = false; 699 - 700 - if (!xe_pxp_is_enabled(pxp)) 701 - return; 702 - 703 - spin_lock_irq(&pxp->queues.lock); 704 - 705 - if (!list_empty(&q->pxp.link)) { 706 - list_del_init(&q->pxp.link); 707 - need_pm_put = true; 708 - } 709 - 710 - q->pxp.type = DRM_XE_PXP_TYPE_NONE; 711 - 712 - spin_unlock_irq(&pxp->queues.lock); 713 - 714 - if (need_pm_put) 715 - xe_pm_runtime_put(pxp->xe); 679 + __pxp_exec_queue_remove(pxp, q, true); 716 680 } 717 681 718 682 static void pxp_invalidate_queues(struct xe_pxp *pxp) 719 683 { 720 684 struct xe_exec_queue *tmp, *q; 685 + LIST_HEAD(to_clean); 721 686 722 687 spin_lock_irq(&pxp->queues.lock); 723 688 724 - /* 725 - * Removing a queue from the PXP list requires a put of the RPM ref that 726 - * the queue holds to keep the PXP session alive, which can't be done 727 - * under spinlock. Since it is safe to kill a queue multiple times, we 728 - * can leave the invalid queue in the list for now and postpone the 729 - * removal and associated RPM put to when the queue is destroyed. 730 - */ 731 - list_for_each_entry(tmp, &pxp->queues.list, pxp.link) { 732 - q = xe_exec_queue_get_unless_zero(tmp); 733 - 689 + list_for_each_entry_safe(q, tmp, &pxp->queues.list, pxp.link) { 690 + q = xe_exec_queue_get_unless_zero(q); 734 691 if (!q) 735 692 continue; 736 693 694 + list_move_tail(&q->pxp.link, &to_clean); 695 + } 696 + spin_unlock_irq(&pxp->queues.lock); 697 + 698 + list_for_each_entry_safe(q, tmp, &to_clean, pxp.link) { 737 699 xe_exec_queue_kill(q); 700 + 701 + /* 702 + * We hold a ref to the queue so there is no risk of racing with 703 + * the calls to exec_queue_remove coming from exec_queue_destroy. 704 + */ 705 + __pxp_exec_queue_remove(pxp, q, false); 706 + 738 707 xe_exec_queue_put(q); 739 708 } 740 - 741 - spin_unlock_irq(&pxp->queues.lock); 742 709 } 743 710 744 711 /** ··· 821 816 */ 822 817 int xe_pxp_pm_suspend(struct xe_pxp *pxp) 823 818 { 819 + bool needs_queue_inval = false; 824 820 int ret = 0; 825 821 826 822 if (!xe_pxp_is_enabled(pxp)) ··· 854 848 break; 855 849 fallthrough; 856 850 case XE_PXP_ACTIVE: 857 - pxp_invalidate_state(pxp); 851 + pxp->key_instance++; 852 + needs_queue_inval = true; 858 853 break; 859 854 default: 860 855 drm_err(&pxp->xe->drm, "unexpected state during PXP suspend: %u", ··· 871 864 pxp->status = XE_PXP_SUSPENDED; 872 865 873 866 mutex_unlock(&pxp->mutex); 867 + 868 + if (needs_queue_inval) 869 + pxp_invalidate_queues(pxp); 874 870 875 871 /* 876 872 * if there is a termination in progress, wait for it.

+49 -1

drivers/gpu/drm/xe/xe_query.c

··· 16 16 #include "regs/xe_gt_regs.h" 17 17 #include "xe_bo.h" 18 18 #include "xe_device.h" 19 + #include "xe_eu_stall.h" 19 20 #include "xe_exec_queue.h" 20 21 #include "xe_force_wake.h" 21 22 #include "xe_ggtt.h" ··· 338 337 config->info[DRM_XE_QUERY_CONFIG_REV_AND_DEVICE_ID] = 339 338 xe->info.devid | (xe->info.revid << 16); 340 339 if (xe_device_get_root_tile(xe)->mem.vram.usable_size) 341 - config->info[DRM_XE_QUERY_CONFIG_FLAGS] = 340 + config->info[DRM_XE_QUERY_CONFIG_FLAGS] |= 342 341 DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM; 342 + if (xe->info.has_usm && IS_ENABLED(CONFIG_DRM_GPUSVM)) 343 + config->info[DRM_XE_QUERY_CONFIG_FLAGS] |= 344 + DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR; 345 + config->info[DRM_XE_QUERY_CONFIG_FLAGS] |= 346 + DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY; 343 347 config->info[DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT] = 344 348 xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : SZ_4K; 345 349 config->info[DRM_XE_QUERY_CONFIG_VA_BITS] = xe->info.va_bits; ··· 735 729 return 0; 736 730 } 737 731 732 + static int query_eu_stall(struct xe_device *xe, 733 + struct drm_xe_device_query *query) 734 + { 735 + void __user *query_ptr = u64_to_user_ptr(query->data); 736 + struct drm_xe_query_eu_stall *info; 737 + size_t size, array_size; 738 + const u64 *rates; 739 + u32 num_rates; 740 + int ret; 741 + 742 + if (!xe_eu_stall_supported_on_platform(xe)) { 743 + drm_dbg(&xe->drm, "EU stall monitoring is not supported on this platform\n"); 744 + return -ENODEV; 745 + } 746 + 747 + array_size = xe_eu_stall_get_sampling_rates(&num_rates, &rates); 748 + size = sizeof(struct drm_xe_query_eu_stall) + array_size; 749 + 750 + if (query->size == 0) { 751 + query->size = size; 752 + return 0; 753 + } else if (XE_IOCTL_DBG(xe, query->size != size)) { 754 + return -EINVAL; 755 + } 756 + 757 + info = kzalloc(size, GFP_KERNEL); 758 + if (!info) 759 + return -ENOMEM; 760 + 761 + info->num_sampling_rates = num_rates; 762 + info->capabilities = DRM_XE_EU_STALL_CAPS_BASE; 763 + info->record_size = xe_eu_stall_data_record_size(xe); 764 + info->per_xecore_buf_size = xe_eu_stall_get_per_xecore_buf_size(); 765 + memcpy(info->sampling_rates, rates, array_size); 766 + 767 + ret = copy_to_user(query_ptr, info, size); 768 + kfree(info); 769 + 770 + return ret ? -EFAULT : 0; 771 + } 772 + 738 773 static int (* const xe_query_funcs[])(struct xe_device *xe, 739 774 struct drm_xe_device_query *query) = { 740 775 query_engines, ··· 788 741 query_uc_fw_version, 789 742 query_oa_units, 790 743 query_pxp_status, 744 + query_eu_stall, 791 745 }; 792 746 793 747 int xe_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)

+121 -2

drivers/gpu/drm/xe/xe_res_cursor.h

··· 26 26 27 27 #include <linux/scatterlist.h> 28 28 29 + #include <drm/drm_pagemap.h> 29 30 #include <drm/ttm/ttm_placement.h> 30 31 #include <drm/ttm/ttm_range_manager.h> 31 32 #include <drm/ttm/ttm_resource.h> ··· 35 34 #include "xe_bo.h" 36 35 #include "xe_device.h" 37 36 #include "xe_macros.h" 37 + #include "xe_svm.h" 38 38 #include "xe_ttm_vram_mgr.h" 39 39 40 - /* state back for walking over vram_mgr, stolen_mgr, and gtt_mgr allocations */ 40 + /** 41 + * struct xe_res_cursor - state for walking over dma mapping, vram_mgr, 42 + * stolen_mgr, and gtt_mgr allocations 43 + */ 41 44 struct xe_res_cursor { 45 + /** @start: Start of cursor */ 42 46 u64 start; 47 + /** @size: Size of the current segment. */ 43 48 u64 size; 49 + /** @remaining: Remaining bytes in cursor */ 44 50 u64 remaining; 51 + /** @node: Opaque point current node cursor */ 45 52 void *node; 53 + /** @mem_type: Memory type */ 46 54 u32 mem_type; 55 + /** @sgl: Scatterlist for cursor */ 47 56 struct scatterlist *sgl; 57 + /** @dma_addr: Current element in a struct drm_pagemap_device_addr array */ 58 + const struct drm_pagemap_device_addr *dma_addr; 59 + /** @mm: Buddy allocator for VRAM cursor */ 48 60 struct drm_buddy *mm; 61 + /** 62 + * @dma_start: DMA start address for the current segment. 63 + * This may be different to @dma_addr.addr since elements in 64 + * the array may be coalesced to a single segment. 65 + */ 66 + u64 dma_start; 67 + /** @dma_seg_size: Size of the current DMA segment. */ 68 + u64 dma_seg_size; 49 69 }; 50 70 51 71 static struct drm_buddy *xe_res_get_buddy(struct ttm_resource *res) ··· 92 70 struct xe_res_cursor *cur) 93 71 { 94 72 cur->sgl = NULL; 73 + cur->dma_addr = NULL; 95 74 if (!res) 96 75 goto fallback; 97 76 ··· 165 142 } 166 143 167 144 /** 145 + * __xe_res_dma_next() - Advance the cursor when end-of-segment is reached 146 + * @cur: The cursor 147 + */ 148 + static inline void __xe_res_dma_next(struct xe_res_cursor *cur) 149 + { 150 + const struct drm_pagemap_device_addr *addr = cur->dma_addr; 151 + u64 start = cur->start; 152 + 153 + while (start >= cur->dma_seg_size) { 154 + start -= cur->dma_seg_size; 155 + addr++; 156 + cur->dma_seg_size = PAGE_SIZE << addr->order; 157 + } 158 + cur->dma_start = addr->addr; 159 + 160 + /* Coalesce array_elements */ 161 + while (cur->dma_seg_size - start < cur->remaining) { 162 + if (cur->dma_start + cur->dma_seg_size != addr[1].addr || 163 + addr->proto != addr[1].proto) 164 + break; 165 + addr++; 166 + cur->dma_seg_size += PAGE_SIZE << addr->order; 167 + } 168 + 169 + cur->dma_addr = addr; 170 + cur->start = start; 171 + cur->size = cur->dma_seg_size - start; 172 + } 173 + 174 + /** 168 175 * xe_res_first_sg - initialize a xe_res_cursor with a scatter gather table 169 176 * 170 177 * @sg: scatter gather table to walk ··· 213 160 cur->start = start; 214 161 cur->remaining = size; 215 162 cur->size = 0; 163 + cur->dma_addr = NULL; 216 164 cur->sgl = sg->sgl; 217 165 cur->mem_type = XE_PL_TT; 218 166 __xe_res_sg_next(cur); 167 + } 168 + 169 + /** 170 + * xe_res_first_dma - initialize a xe_res_cursor with dma_addr array 171 + * 172 + * @dma_addr: struct drm_pagemap_device_addr array to walk 173 + * @start: Start of the range 174 + * @size: Size of the range 175 + * @cur: cursor object to initialize 176 + * 177 + * Start walking over the range of allocations between @start and @size. 178 + */ 179 + static inline void xe_res_first_dma(const struct drm_pagemap_device_addr *dma_addr, 180 + u64 start, u64 size, 181 + struct xe_res_cursor *cur) 182 + { 183 + XE_WARN_ON(!dma_addr); 184 + XE_WARN_ON(!IS_ALIGNED(start, PAGE_SIZE) || 185 + !IS_ALIGNED(size, PAGE_SIZE)); 186 + 187 + cur->node = NULL; 188 + cur->start = start; 189 + cur->remaining = size; 190 + cur->dma_seg_size = PAGE_SIZE << dma_addr->order; 191 + cur->dma_start = 0; 192 + cur->size = 0; 193 + cur->dma_addr = dma_addr; 194 + __xe_res_dma_next(cur); 195 + cur->sgl = NULL; 196 + cur->mem_type = XE_PL_TT; 219 197 } 220 198 221 199 /** ··· 272 188 if (cur->size > size) { 273 189 cur->size -= size; 274 190 cur->start += size; 191 + return; 192 + } 193 + 194 + if (cur->dma_addr) { 195 + cur->start += size; 196 + __xe_res_dma_next(cur); 275 197 return; 276 198 } 277 199 ··· 322 232 */ 323 233 static inline u64 xe_res_dma(const struct xe_res_cursor *cur) 324 234 { 325 - return cur->sgl ? sg_dma_address(cur->sgl) + cur->start : cur->start; 235 + if (cur->dma_addr) 236 + return cur->dma_start + cur->start; 237 + else if (cur->sgl) 238 + return sg_dma_address(cur->sgl) + cur->start; 239 + else 240 + return cur->start; 241 + } 242 + 243 + /** 244 + * xe_res_is_vram() - Whether the cursor current dma address points to 245 + * same-device VRAM 246 + * @cur: The cursor. 247 + * 248 + * Return: true iff the address returned by xe_res_dma() points to internal vram. 249 + */ 250 + static inline bool xe_res_is_vram(const struct xe_res_cursor *cur) 251 + { 252 + if (cur->dma_addr) 253 + return cur->dma_addr->proto == XE_INTERCONNECT_VRAM; 254 + 255 + switch (cur->mem_type) { 256 + case XE_PL_STOLEN: 257 + case XE_PL_VRAM0: 258 + case XE_PL_VRAM1: 259 + return true; 260 + default: 261 + break; 262 + } 263 + 264 + return false; 326 265 } 327 266 #endif

+4

drivers/gpu/drm/xe/xe_ring_ops.c

··· 177 177 bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK); 178 178 u32 flags; 179 179 180 + if (XE_WA(gt, 14016712196)) 181 + i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH, 182 + LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0); 183 + 180 184 flags = (PIPE_CONTROL_CS_STALL | 181 185 PIPE_CONTROL_TILE_CACHE_FLUSH | 182 186 PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |

+38 -39

drivers/gpu/drm/xe/xe_survivability_mode.c

··· 127 127 128 128 static DEVICE_ATTR_ADMIN_RO(survivability_mode); 129 129 130 - static void enable_survivability_mode(struct pci_dev *pdev) 130 + static void xe_survivability_mode_fini(void *arg) 131 + { 132 + struct xe_device *xe = arg; 133 + struct pci_dev *pdev = to_pci_dev(xe->drm.dev); 134 + struct device *dev = &pdev->dev; 135 + 136 + sysfs_remove_file(&dev->kobj, &dev_attr_survivability_mode.attr); 137 + } 138 + 139 + static int enable_survivability_mode(struct pci_dev *pdev) 131 140 { 132 141 struct device *dev = &pdev->dev; 133 142 struct xe_device *xe = pdev_to_xe_device(pdev); 134 143 struct xe_survivability *survivability = &xe->survivability; 135 144 int ret = 0; 136 145 137 - /* set survivability mode */ 138 - survivability->mode = true; 139 - dev_info(dev, "In Survivability Mode\n"); 140 - 141 146 /* create survivability mode sysfs */ 142 147 ret = sysfs_create_file(&dev->kobj, &dev_attr_survivability_mode.attr); 143 148 if (ret) { 144 149 dev_warn(dev, "Failed to create survivability sysfs files\n"); 145 - return; 150 + return ret; 146 151 } 147 152 148 - xe_heci_gsc_init(xe); 153 + ret = devm_add_action_or_reset(xe->drm.dev, 154 + xe_survivability_mode_fini, xe); 155 + if (ret) 156 + return ret; 157 + 158 + ret = xe_heci_gsc_init(xe); 159 + if (ret) 160 + return ret; 149 161 150 162 xe_vsec_init(xe); 163 + 164 + survivability->mode = true; 165 + dev_err(dev, "In Survivability Mode\n"); 166 + 167 + return 0; 151 168 } 152 169 153 170 /** 154 - * xe_survivability_mode_enabled - check if survivability mode is enabled 171 + * xe_survivability_mode_is_enabled - check if survivability mode is enabled 155 172 * @xe: xe device instance 156 173 * 157 174 * Returns true if in survivability mode, false otherwise 158 175 */ 159 - bool xe_survivability_mode_enabled(struct xe_device *xe) 176 + bool xe_survivability_mode_is_enabled(struct xe_device *xe) 160 177 { 161 - struct xe_survivability *survivability = &xe->survivability; 162 - 163 - return survivability->mode; 178 + return xe->survivability.mode; 164 179 } 165 180 166 181 /** ··· 198 183 data = xe_mmio_read32(mmio, PCODE_SCRATCH(0)); 199 184 survivability->boot_status = REG_FIELD_GET(BOOT_STATUS, data); 200 185 201 - return (survivability->boot_status == NON_CRITICAL_FAILURE || 202 - survivability->boot_status == CRITICAL_FAILURE); 186 + return survivability->boot_status == NON_CRITICAL_FAILURE || 187 + survivability->boot_status == CRITICAL_FAILURE; 203 188 } 204 189 205 190 /** 206 - * xe_survivability_mode_remove - remove survivability mode 191 + * xe_survivability_mode_enable - Initialize and enable the survivability mode 207 192 * @xe: xe device instance 208 193 * 209 - * clean up sysfs entries of survivability mode 210 - */ 211 - void xe_survivability_mode_remove(struct xe_device *xe) 212 - { 213 - struct xe_survivability *survivability = &xe->survivability; 214 - struct pci_dev *pdev = to_pci_dev(xe->drm.dev); 215 - struct device *dev = &pdev->dev; 216 - 217 - sysfs_remove_file(&dev->kobj, &dev_attr_survivability_mode.attr); 218 - xe_heci_gsc_fini(xe); 219 - kfree(survivability->info); 220 - pci_set_drvdata(pdev, NULL); 221 - } 222 - 223 - /** 224 - * xe_survivability_mode_init - Initialize the survivability mode 225 - * @xe: xe device instance 194 + * Initialize survivability information and enable survivability mode 226 195 * 227 - * Initializes survivability information and enables survivability mode 196 + * Return: 0 for success, negative error code otherwise. 228 197 */ 229 - void xe_survivability_mode_init(struct xe_device *xe) 198 + int xe_survivability_mode_enable(struct xe_device *xe) 230 199 { 231 200 struct xe_survivability *survivability = &xe->survivability; 232 201 struct xe_survivability_info *info; ··· 218 219 219 220 survivability->size = MAX_SCRATCH_MMIO; 220 221 221 - info = kcalloc(survivability->size, sizeof(*info), GFP_KERNEL); 222 + info = devm_kcalloc(xe->drm.dev, survivability->size, sizeof(*info), 223 + GFP_KERNEL); 222 224 if (!info) 223 - return; 225 + return -ENOMEM; 224 226 225 227 survivability->info = info; 226 228 ··· 230 230 /* Only log debug information and exit if it is a critical failure */ 231 231 if (survivability->boot_status == CRITICAL_FAILURE) { 232 232 log_survivability_info(pdev); 233 - kfree(survivability->info); 234 - return; 233 + return -ENXIO; 235 234 } 236 235 237 - enable_survivability_mode(pdev); 236 + return enable_survivability_mode(pdev); 238 237 }

+2 -3

drivers/gpu/drm/xe/xe_survivability_mode.h

··· 10 10 11 11 struct xe_device; 12 12 13 - void xe_survivability_mode_init(struct xe_device *xe); 14 - void xe_survivability_mode_remove(struct xe_device *xe); 15 - bool xe_survivability_mode_enabled(struct xe_device *xe); 13 + int xe_survivability_mode_enable(struct xe_device *xe); 14 + bool xe_survivability_mode_is_enabled(struct xe_device *xe); 16 15 bool xe_survivability_mode_required(struct xe_device *xe); 17 16 18 17 #endif /* _XE_SURVIVABILITY_MODE_H_ */

+946

drivers/gpu/drm/xe/xe_svm.c

··· 1 + // SPDX-License-Identifier: MIT 2 + /* 3 + * Copyright © 2024 Intel Corporation 4 + */ 5 + 6 + #include "xe_bo.h" 7 + #include "xe_gt_tlb_invalidation.h" 8 + #include "xe_migrate.h" 9 + #include "xe_module.h" 10 + #include "xe_pt.h" 11 + #include "xe_svm.h" 12 + #include "xe_ttm_vram_mgr.h" 13 + #include "xe_vm.h" 14 + #include "xe_vm_types.h" 15 + 16 + static bool xe_svm_range_in_vram(struct xe_svm_range *range) 17 + { 18 + /* Not reliable without notifier lock */ 19 + return range->base.flags.has_devmem_pages; 20 + } 21 + 22 + static bool xe_svm_range_has_vram_binding(struct xe_svm_range *range) 23 + { 24 + /* Not reliable without notifier lock */ 25 + return xe_svm_range_in_vram(range) && range->tile_present; 26 + } 27 + 28 + static struct xe_vm *gpusvm_to_vm(struct drm_gpusvm *gpusvm) 29 + { 30 + return container_of(gpusvm, struct xe_vm, svm.gpusvm); 31 + } 32 + 33 + static struct xe_vm *range_to_vm(struct drm_gpusvm_range *r) 34 + { 35 + return gpusvm_to_vm(r->gpusvm); 36 + } 37 + 38 + static unsigned long xe_svm_range_start(struct xe_svm_range *range) 39 + { 40 + return drm_gpusvm_range_start(&range->base); 41 + } 42 + 43 + static unsigned long xe_svm_range_end(struct xe_svm_range *range) 44 + { 45 + return drm_gpusvm_range_end(&range->base); 46 + } 47 + 48 + static unsigned long xe_svm_range_size(struct xe_svm_range *range) 49 + { 50 + return drm_gpusvm_range_size(&range->base); 51 + } 52 + 53 + #define range_debug(r__, operaton__) \ 54 + vm_dbg(&range_to_vm(&(r__)->base)->xe->drm, \ 55 + "%s: asid=%u, gpusvm=%p, vram=%d,%d, seqno=%lu, " \ 56 + "start=0x%014lx, end=0x%014lx, size=%lu", \ 57 + (operaton__), range_to_vm(&(r__)->base)->usm.asid, \ 58 + (r__)->base.gpusvm, \ 59 + xe_svm_range_in_vram((r__)) ? 1 : 0, \ 60 + xe_svm_range_has_vram_binding((r__)) ? 1 : 0, \ 61 + (r__)->base.notifier_seq, \ 62 + xe_svm_range_start((r__)), xe_svm_range_end((r__)), \ 63 + xe_svm_range_size((r__))) 64 + 65 + void xe_svm_range_debug(struct xe_svm_range *range, const char *operation) 66 + { 67 + range_debug(range, operation); 68 + } 69 + 70 + static void *xe_svm_devm_owner(struct xe_device *xe) 71 + { 72 + return xe; 73 + } 74 + 75 + static struct drm_gpusvm_range * 76 + xe_svm_range_alloc(struct drm_gpusvm *gpusvm) 77 + { 78 + struct xe_svm_range *range; 79 + 80 + range = kzalloc(sizeof(*range), GFP_KERNEL); 81 + if (!range) 82 + return ERR_PTR(-ENOMEM); 83 + 84 + INIT_LIST_HEAD(&range->garbage_collector_link); 85 + xe_vm_get(gpusvm_to_vm(gpusvm)); 86 + 87 + return &range->base; 88 + } 89 + 90 + static void xe_svm_range_free(struct drm_gpusvm_range *range) 91 + { 92 + xe_vm_put(range_to_vm(range)); 93 + kfree(range); 94 + } 95 + 96 + static struct xe_svm_range *to_xe_range(struct drm_gpusvm_range *r) 97 + { 98 + return container_of(r, struct xe_svm_range, base); 99 + } 100 + 101 + static void 102 + xe_svm_garbage_collector_add_range(struct xe_vm *vm, struct xe_svm_range *range, 103 + const struct mmu_notifier_range *mmu_range) 104 + { 105 + struct xe_device *xe = vm->xe; 106 + 107 + range_debug(range, "GARBAGE COLLECTOR ADD"); 108 + 109 + drm_gpusvm_range_set_unmapped(&range->base, mmu_range); 110 + 111 + spin_lock(&vm->svm.garbage_collector.lock); 112 + if (list_empty(&range->garbage_collector_link)) 113 + list_add_tail(&range->garbage_collector_link, 114 + &vm->svm.garbage_collector.range_list); 115 + spin_unlock(&vm->svm.garbage_collector.lock); 116 + 117 + queue_work(xe_device_get_root_tile(xe)->primary_gt->usm.pf_wq, 118 + &vm->svm.garbage_collector.work); 119 + } 120 + 121 + static u8 122 + xe_svm_range_notifier_event_begin(struct xe_vm *vm, struct drm_gpusvm_range *r, 123 + const struct mmu_notifier_range *mmu_range, 124 + u64 *adj_start, u64 *adj_end) 125 + { 126 + struct xe_svm_range *range = to_xe_range(r); 127 + struct xe_device *xe = vm->xe; 128 + struct xe_tile *tile; 129 + u8 tile_mask = 0; 130 + u8 id; 131 + 132 + xe_svm_assert_in_notifier(vm); 133 + 134 + range_debug(range, "NOTIFIER"); 135 + 136 + /* Skip if already unmapped or if no binding exist */ 137 + if (range->base.flags.unmapped || !range->tile_present) 138 + return 0; 139 + 140 + range_debug(range, "NOTIFIER - EXECUTE"); 141 + 142 + /* Adjust invalidation to range boundaries */ 143 + *adj_start = min(xe_svm_range_start(range), mmu_range->start); 144 + *adj_end = max(xe_svm_range_end(range), mmu_range->end); 145 + 146 + /* 147 + * XXX: Ideally would zap PTEs in one shot in xe_svm_invalidate but the 148 + * invalidation code can't correctly cope with sparse ranges or 149 + * invalidations spanning multiple ranges. 150 + */ 151 + for_each_tile(tile, xe, id) 152 + if (xe_pt_zap_ptes_range(tile, vm, range)) { 153 + tile_mask |= BIT(id); 154 + range->tile_invalidated |= BIT(id); 155 + } 156 + 157 + return tile_mask; 158 + } 159 + 160 + static void 161 + xe_svm_range_notifier_event_end(struct xe_vm *vm, struct drm_gpusvm_range *r, 162 + const struct mmu_notifier_range *mmu_range) 163 + { 164 + struct drm_gpusvm_ctx ctx = { .in_notifier = true, }; 165 + 166 + xe_svm_assert_in_notifier(vm); 167 + 168 + drm_gpusvm_range_unmap_pages(&vm->svm.gpusvm, r, &ctx); 169 + if (!xe_vm_is_closed(vm) && mmu_range->event == MMU_NOTIFY_UNMAP) 170 + xe_svm_garbage_collector_add_range(vm, to_xe_range(r), 171 + mmu_range); 172 + } 173 + 174 + static void xe_svm_invalidate(struct drm_gpusvm *gpusvm, 175 + struct drm_gpusvm_notifier *notifier, 176 + const struct mmu_notifier_range *mmu_range) 177 + { 178 + struct xe_vm *vm = gpusvm_to_vm(gpusvm); 179 + struct xe_device *xe = vm->xe; 180 + struct xe_tile *tile; 181 + struct drm_gpusvm_range *r, *first; 182 + struct xe_gt_tlb_invalidation_fence 183 + fence[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE]; 184 + u64 adj_start = mmu_range->start, adj_end = mmu_range->end; 185 + u8 tile_mask = 0; 186 + u8 id; 187 + u32 fence_id = 0; 188 + long err; 189 + 190 + xe_svm_assert_in_notifier(vm); 191 + 192 + vm_dbg(&gpusvm_to_vm(gpusvm)->xe->drm, 193 + "INVALIDATE: asid=%u, gpusvm=%p, seqno=%lu, start=0x%016lx, end=0x%016lx, event=%d", 194 + vm->usm.asid, gpusvm, notifier->notifier.invalidate_seq, 195 + mmu_range->start, mmu_range->end, mmu_range->event); 196 + 197 + /* Adjust invalidation to notifier boundaries */ 198 + adj_start = max(drm_gpusvm_notifier_start(notifier), adj_start); 199 + adj_end = min(drm_gpusvm_notifier_end(notifier), adj_end); 200 + 201 + first = drm_gpusvm_range_find(notifier, adj_start, adj_end); 202 + if (!first) 203 + return; 204 + 205 + /* 206 + * PTs may be getting destroyed so not safe to touch these but PT should 207 + * be invalidated at this point in time. Regardless we still need to 208 + * ensure any dma mappings are unmapped in the here. 209 + */ 210 + if (xe_vm_is_closed(vm)) 211 + goto range_notifier_event_end; 212 + 213 + /* 214 + * XXX: Less than ideal to always wait on VM's resv slots if an 215 + * invalidation is not required. Could walk range list twice to figure 216 + * out if an invalidations is need, but also not ideal. 217 + */ 218 + err = dma_resv_wait_timeout(xe_vm_resv(vm), 219 + DMA_RESV_USAGE_BOOKKEEP, 220 + false, MAX_SCHEDULE_TIMEOUT); 221 + XE_WARN_ON(err <= 0); 222 + 223 + r = first; 224 + drm_gpusvm_for_each_range(r, notifier, adj_start, adj_end) 225 + tile_mask |= xe_svm_range_notifier_event_begin(vm, r, mmu_range, 226 + &adj_start, 227 + &adj_end); 228 + if (!tile_mask) 229 + goto range_notifier_event_end; 230 + 231 + xe_device_wmb(xe); 232 + 233 + for_each_tile(tile, xe, id) { 234 + if (tile_mask & BIT(id)) { 235 + int err; 236 + 237 + xe_gt_tlb_invalidation_fence_init(tile->primary_gt, 238 + &fence[fence_id], true); 239 + 240 + err = xe_gt_tlb_invalidation_range(tile->primary_gt, 241 + &fence[fence_id], 242 + adj_start, 243 + adj_end, 244 + vm->usm.asid); 245 + if (WARN_ON_ONCE(err < 0)) 246 + goto wait; 247 + ++fence_id; 248 + 249 + if (!tile->media_gt) 250 + continue; 251 + 252 + xe_gt_tlb_invalidation_fence_init(tile->media_gt, 253 + &fence[fence_id], true); 254 + 255 + err = xe_gt_tlb_invalidation_range(tile->media_gt, 256 + &fence[fence_id], 257 + adj_start, 258 + adj_end, 259 + vm->usm.asid); 260 + if (WARN_ON_ONCE(err < 0)) 261 + goto wait; 262 + ++fence_id; 263 + } 264 + } 265 + 266 + wait: 267 + for (id = 0; id < fence_id; ++id) 268 + xe_gt_tlb_invalidation_fence_wait(&fence[id]); 269 + 270 + range_notifier_event_end: 271 + r = first; 272 + drm_gpusvm_for_each_range(r, notifier, adj_start, adj_end) 273 + xe_svm_range_notifier_event_end(vm, r, mmu_range); 274 + } 275 + 276 + static int __xe_svm_garbage_collector(struct xe_vm *vm, 277 + struct xe_svm_range *range) 278 + { 279 + struct dma_fence *fence; 280 + 281 + range_debug(range, "GARBAGE COLLECTOR"); 282 + 283 + xe_vm_lock(vm, false); 284 + fence = xe_vm_range_unbind(vm, range); 285 + xe_vm_unlock(vm); 286 + if (IS_ERR(fence)) 287 + return PTR_ERR(fence); 288 + dma_fence_put(fence); 289 + 290 + drm_gpusvm_range_remove(&vm->svm.gpusvm, &range->base); 291 + 292 + return 0; 293 + } 294 + 295 + static int xe_svm_garbage_collector(struct xe_vm *vm) 296 + { 297 + struct xe_svm_range *range; 298 + int err; 299 + 300 + lockdep_assert_held_write(&vm->lock); 301 + 302 + if (xe_vm_is_closed_or_banned(vm)) 303 + return -ENOENT; 304 + 305 + spin_lock(&vm->svm.garbage_collector.lock); 306 + for (;;) { 307 + range = list_first_entry_or_null(&vm->svm.garbage_collector.range_list, 308 + typeof(*range), 309 + garbage_collector_link); 310 + if (!range) 311 + break; 312 + 313 + list_del(&range->garbage_collector_link); 314 + spin_unlock(&vm->svm.garbage_collector.lock); 315 + 316 + err = __xe_svm_garbage_collector(vm, range); 317 + if (err) { 318 + drm_warn(&vm->xe->drm, 319 + "Garbage collection failed: %pe\n", 320 + ERR_PTR(err)); 321 + xe_vm_kill(vm, true); 322 + return err; 323 + } 324 + 325 + spin_lock(&vm->svm.garbage_collector.lock); 326 + } 327 + spin_unlock(&vm->svm.garbage_collector.lock); 328 + 329 + return 0; 330 + } 331 + 332 + static void xe_svm_garbage_collector_work_func(struct work_struct *w) 333 + { 334 + struct xe_vm *vm = container_of(w, struct xe_vm, 335 + svm.garbage_collector.work); 336 + 337 + down_write(&vm->lock); 338 + xe_svm_garbage_collector(vm); 339 + up_write(&vm->lock); 340 + } 341 + 342 + static struct xe_vram_region *page_to_vr(struct page *page) 343 + { 344 + return container_of(page->pgmap, struct xe_vram_region, pagemap); 345 + } 346 + 347 + static struct xe_tile *vr_to_tile(struct xe_vram_region *vr) 348 + { 349 + return container_of(vr, struct xe_tile, mem.vram); 350 + } 351 + 352 + static u64 xe_vram_region_page_to_dpa(struct xe_vram_region *vr, 353 + struct page *page) 354 + { 355 + u64 dpa; 356 + struct xe_tile *tile = vr_to_tile(vr); 357 + u64 pfn = page_to_pfn(page); 358 + u64 offset; 359 + 360 + xe_tile_assert(tile, is_device_private_page(page)); 361 + xe_tile_assert(tile, (pfn << PAGE_SHIFT) >= vr->hpa_base); 362 + 363 + offset = (pfn << PAGE_SHIFT) - vr->hpa_base; 364 + dpa = vr->dpa_base + offset; 365 + 366 + return dpa; 367 + } 368 + 369 + enum xe_svm_copy_dir { 370 + XE_SVM_COPY_TO_VRAM, 371 + XE_SVM_COPY_TO_SRAM, 372 + }; 373 + 374 + static int xe_svm_copy(struct page **pages, dma_addr_t *dma_addr, 375 + unsigned long npages, const enum xe_svm_copy_dir dir) 376 + { 377 + struct xe_vram_region *vr = NULL; 378 + struct xe_tile *tile; 379 + struct dma_fence *fence = NULL; 380 + unsigned long i; 381 + #define XE_VRAM_ADDR_INVALID ~0x0ull 382 + u64 vram_addr = XE_VRAM_ADDR_INVALID; 383 + int err = 0, pos = 0; 384 + bool sram = dir == XE_SVM_COPY_TO_SRAM; 385 + 386 + /* 387 + * This flow is complex: it locates physically contiguous device pages, 388 + * derives the starting physical address, and performs a single GPU copy 389 + * to for every 8M chunk in a DMA address array. Both device pages and 390 + * DMA addresses may be sparsely populated. If either is NULL, a copy is 391 + * triggered based on the current search state. The last GPU copy is 392 + * waited on to ensure all copies are complete. 393 + */ 394 + 395 + for (i = 0; i < npages; ++i) { 396 + struct page *spage = pages[i]; 397 + struct dma_fence *__fence; 398 + u64 __vram_addr; 399 + bool match = false, chunk, last; 400 + 401 + #define XE_MIGRATE_CHUNK_SIZE SZ_8M 402 + chunk = (i - pos) == (XE_MIGRATE_CHUNK_SIZE / PAGE_SIZE); 403 + last = (i + 1) == npages; 404 + 405 + /* No CPU page and no device pages queue'd to copy */ 406 + if (!dma_addr[i] && vram_addr == XE_VRAM_ADDR_INVALID) 407 + continue; 408 + 409 + if (!vr && spage) { 410 + vr = page_to_vr(spage); 411 + tile = vr_to_tile(vr); 412 + } 413 + XE_WARN_ON(spage && page_to_vr(spage) != vr); 414 + 415 + /* 416 + * CPU page and device page valid, capture physical address on 417 + * first device page, check if physical contiguous on subsequent 418 + * device pages. 419 + */ 420 + if (dma_addr[i] && spage) { 421 + __vram_addr = xe_vram_region_page_to_dpa(vr, spage); 422 + if (vram_addr == XE_VRAM_ADDR_INVALID) { 423 + vram_addr = __vram_addr; 424 + pos = i; 425 + } 426 + 427 + match = vram_addr + PAGE_SIZE * (i - pos) == __vram_addr; 428 + } 429 + 430 + /* 431 + * Mismatched physical address, 8M copy chunk, or last page - 432 + * trigger a copy. 433 + */ 434 + if (!match || chunk || last) { 435 + /* 436 + * Extra page for first copy if last page and matching 437 + * physical address. 438 + */ 439 + int incr = (match && last) ? 1 : 0; 440 + 441 + if (vram_addr != XE_VRAM_ADDR_INVALID) { 442 + if (sram) { 443 + vm_dbg(&tile->xe->drm, 444 + "COPY TO SRAM - 0x%016llx -> 0x%016llx, NPAGES=%ld", 445 + vram_addr, (u64)dma_addr[pos], i - pos + incr); 446 + __fence = xe_migrate_from_vram(tile->migrate, 447 + i - pos + incr, 448 + vram_addr, 449 + dma_addr + pos); 450 + } else { 451 + vm_dbg(&tile->xe->drm, 452 + "COPY TO VRAM - 0x%016llx -> 0x%016llx, NPAGES=%ld", 453 + (u64)dma_addr[pos], vram_addr, i - pos + incr); 454 + __fence = xe_migrate_to_vram(tile->migrate, 455 + i - pos + incr, 456 + dma_addr + pos, 457 + vram_addr); 458 + } 459 + if (IS_ERR(__fence)) { 460 + err = PTR_ERR(__fence); 461 + goto err_out; 462 + } 463 + 464 + dma_fence_put(fence); 465 + fence = __fence; 466 + } 467 + 468 + /* Setup physical address of next device page */ 469 + if (dma_addr[i] && spage) { 470 + vram_addr = __vram_addr; 471 + pos = i; 472 + } else { 473 + vram_addr = XE_VRAM_ADDR_INVALID; 474 + } 475 + 476 + /* Extra mismatched device page, copy it */ 477 + if (!match && last && vram_addr != XE_VRAM_ADDR_INVALID) { 478 + if (sram) { 479 + vm_dbg(&tile->xe->drm, 480 + "COPY TO SRAM - 0x%016llx -> 0x%016llx, NPAGES=%d", 481 + vram_addr, (u64)dma_addr[pos], 1); 482 + __fence = xe_migrate_from_vram(tile->migrate, 1, 483 + vram_addr, 484 + dma_addr + pos); 485 + } else { 486 + vm_dbg(&tile->xe->drm, 487 + "COPY TO VRAM - 0x%016llx -> 0x%016llx, NPAGES=%d", 488 + (u64)dma_addr[pos], vram_addr, 1); 489 + __fence = xe_migrate_to_vram(tile->migrate, 1, 490 + dma_addr + pos, 491 + vram_addr); 492 + } 493 + if (IS_ERR(__fence)) { 494 + err = PTR_ERR(__fence); 495 + goto err_out; 496 + } 497 + 498 + dma_fence_put(fence); 499 + fence = __fence; 500 + } 501 + } 502 + } 503 + 504 + err_out: 505 + /* Wait for all copies to complete */ 506 + if (fence) { 507 + dma_fence_wait(fence, false); 508 + dma_fence_put(fence); 509 + } 510 + 511 + return err; 512 + #undef XE_MIGRATE_CHUNK_SIZE 513 + #undef XE_VRAM_ADDR_INVALID 514 + } 515 + 516 + static int xe_svm_copy_to_devmem(struct page **pages, dma_addr_t *dma_addr, 517 + unsigned long npages) 518 + { 519 + return xe_svm_copy(pages, dma_addr, npages, XE_SVM_COPY_TO_VRAM); 520 + } 521 + 522 + static int xe_svm_copy_to_ram(struct page **pages, dma_addr_t *dma_addr, 523 + unsigned long npages) 524 + { 525 + return xe_svm_copy(pages, dma_addr, npages, XE_SVM_COPY_TO_SRAM); 526 + } 527 + 528 + static struct xe_bo *to_xe_bo(struct drm_gpusvm_devmem *devmem_allocation) 529 + { 530 + return container_of(devmem_allocation, struct xe_bo, devmem_allocation); 531 + } 532 + 533 + static void xe_svm_devmem_release(struct drm_gpusvm_devmem *devmem_allocation) 534 + { 535 + struct xe_bo *bo = to_xe_bo(devmem_allocation); 536 + 537 + xe_bo_put_async(bo); 538 + } 539 + 540 + static u64 block_offset_to_pfn(struct xe_vram_region *vr, u64 offset) 541 + { 542 + return PHYS_PFN(offset + vr->hpa_base); 543 + } 544 + 545 + static struct drm_buddy *tile_to_buddy(struct xe_tile *tile) 546 + { 547 + return &tile->mem.vram.ttm.mm; 548 + } 549 + 550 + static int xe_svm_populate_devmem_pfn(struct drm_gpusvm_devmem *devmem_allocation, 551 + unsigned long npages, unsigned long *pfn) 552 + { 553 + struct xe_bo *bo = to_xe_bo(devmem_allocation); 554 + struct ttm_resource *res = bo->ttm.resource; 555 + struct list_head *blocks = &to_xe_ttm_vram_mgr_resource(res)->blocks; 556 + struct drm_buddy_block *block; 557 + int j = 0; 558 + 559 + list_for_each_entry(block, blocks, link) { 560 + struct xe_vram_region *vr = block->private; 561 + struct xe_tile *tile = vr_to_tile(vr); 562 + struct drm_buddy *buddy = tile_to_buddy(tile); 563 + u64 block_pfn = block_offset_to_pfn(vr, drm_buddy_block_offset(block)); 564 + int i; 565 + 566 + for (i = 0; i < drm_buddy_block_size(buddy, block) >> PAGE_SHIFT; ++i) 567 + pfn[j++] = block_pfn + i; 568 + } 569 + 570 + return 0; 571 + } 572 + 573 + static const struct drm_gpusvm_devmem_ops gpusvm_devmem_ops = { 574 + .devmem_release = xe_svm_devmem_release, 575 + .populate_devmem_pfn = xe_svm_populate_devmem_pfn, 576 + .copy_to_devmem = xe_svm_copy_to_devmem, 577 + .copy_to_ram = xe_svm_copy_to_ram, 578 + }; 579 + 580 + static const struct drm_gpusvm_ops gpusvm_ops = { 581 + .range_alloc = xe_svm_range_alloc, 582 + .range_free = xe_svm_range_free, 583 + .invalidate = xe_svm_invalidate, 584 + }; 585 + 586 + static const unsigned long fault_chunk_sizes[] = { 587 + SZ_2M, 588 + SZ_64K, 589 + SZ_4K, 590 + }; 591 + 592 + /** 593 + * xe_svm_init() - SVM initialize 594 + * @vm: The VM. 595 + * 596 + * Initialize SVM state which is embedded within the VM. 597 + * 598 + * Return: 0 on success, negative error code on error. 599 + */ 600 + int xe_svm_init(struct xe_vm *vm) 601 + { 602 + int err; 603 + 604 + spin_lock_init(&vm->svm.garbage_collector.lock); 605 + INIT_LIST_HEAD(&vm->svm.garbage_collector.range_list); 606 + INIT_WORK(&vm->svm.garbage_collector.work, 607 + xe_svm_garbage_collector_work_func); 608 + 609 + err = drm_gpusvm_init(&vm->svm.gpusvm, "Xe SVM", &vm->xe->drm, 610 + current->mm, xe_svm_devm_owner(vm->xe), 0, 611 + vm->size, xe_modparam.svm_notifier_size * SZ_1M, 612 + &gpusvm_ops, fault_chunk_sizes, 613 + ARRAY_SIZE(fault_chunk_sizes)); 614 + if (err) 615 + return err; 616 + 617 + drm_gpusvm_driver_set_lock(&vm->svm.gpusvm, &vm->lock); 618 + 619 + return 0; 620 + } 621 + 622 + /** 623 + * xe_svm_close() - SVM close 624 + * @vm: The VM. 625 + * 626 + * Close SVM state (i.e., stop and flush all SVM actions). 627 + */ 628 + void xe_svm_close(struct xe_vm *vm) 629 + { 630 + xe_assert(vm->xe, xe_vm_is_closed(vm)); 631 + flush_work(&vm->svm.garbage_collector.work); 632 + } 633 + 634 + /** 635 + * xe_svm_fini() - SVM finalize 636 + * @vm: The VM. 637 + * 638 + * Finalize SVM state which is embedded within the VM. 639 + */ 640 + void xe_svm_fini(struct xe_vm *vm) 641 + { 642 + xe_assert(vm->xe, xe_vm_is_closed(vm)); 643 + 644 + drm_gpusvm_fini(&vm->svm.gpusvm); 645 + } 646 + 647 + static bool xe_svm_range_is_valid(struct xe_svm_range *range, 648 + struct xe_tile *tile) 649 + { 650 + return (range->tile_present & ~range->tile_invalidated) & BIT(tile->id); 651 + } 652 + 653 + static struct xe_vram_region *tile_to_vr(struct xe_tile *tile) 654 + { 655 + return &tile->mem.vram; 656 + } 657 + 658 + static int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile, 659 + struct xe_svm_range *range, 660 + const struct drm_gpusvm_ctx *ctx) 661 + { 662 + struct mm_struct *mm = vm->svm.gpusvm.mm; 663 + struct xe_vram_region *vr = tile_to_vr(tile); 664 + struct drm_buddy_block *block; 665 + struct list_head *blocks; 666 + struct xe_bo *bo; 667 + ktime_t end = 0; 668 + int err; 669 + 670 + range_debug(range, "ALLOCATE VRAM"); 671 + 672 + if (!mmget_not_zero(mm)) 673 + return -EFAULT; 674 + mmap_read_lock(mm); 675 + 676 + retry: 677 + bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL, 678 + xe_svm_range_size(range), 679 + ttm_bo_type_device, 680 + XE_BO_FLAG_VRAM_IF_DGFX(tile) | 681 + XE_BO_FLAG_CPU_ADDR_MIRROR); 682 + if (IS_ERR(bo)) { 683 + err = PTR_ERR(bo); 684 + if (xe_vm_validate_should_retry(NULL, err, &end)) 685 + goto retry; 686 + goto unlock; 687 + } 688 + 689 + drm_gpusvm_devmem_init(&bo->devmem_allocation, 690 + vm->xe->drm.dev, mm, 691 + &gpusvm_devmem_ops, 692 + &tile->mem.vram.dpagemap, 693 + xe_svm_range_size(range)); 694 + 695 + blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks; 696 + list_for_each_entry(block, blocks, link) 697 + block->private = vr; 698 + 699 + err = drm_gpusvm_migrate_to_devmem(&vm->svm.gpusvm, &range->base, 700 + &bo->devmem_allocation, ctx); 701 + xe_bo_unlock(bo); 702 + if (err) 703 + xe_bo_put(bo); /* Creation ref */ 704 + 705 + unlock: 706 + mmap_read_unlock(mm); 707 + mmput(mm); 708 + 709 + return err; 710 + } 711 + 712 + /** 713 + * xe_svm_handle_pagefault() - SVM handle page fault 714 + * @vm: The VM. 715 + * @vma: The CPU address mirror VMA. 716 + * @tile: The tile upon the fault occurred. 717 + * @fault_addr: The GPU fault address. 718 + * @atomic: The fault atomic access bit. 719 + * 720 + * Create GPU bindings for a SVM page fault. Optionally migrate to device 721 + * memory. 722 + * 723 + * Return: 0 on success, negative error code on error. 724 + */ 725 + int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma, 726 + struct xe_tile *tile, u64 fault_addr, 727 + bool atomic) 728 + { 729 + struct drm_gpusvm_ctx ctx = { 730 + .read_only = xe_vma_read_only(vma), 731 + .devmem_possible = IS_DGFX(vm->xe) && 732 + IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR), 733 + .check_pages_threshold = IS_DGFX(vm->xe) && 734 + IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR) ? SZ_64K : 0, 735 + }; 736 + struct xe_svm_range *range; 737 + struct drm_gpusvm_range *r; 738 + struct drm_exec exec; 739 + struct dma_fence *fence; 740 + ktime_t end = 0; 741 + int err; 742 + 743 + lockdep_assert_held_write(&vm->lock); 744 + xe_assert(vm->xe, xe_vma_is_cpu_addr_mirror(vma)); 745 + 746 + retry: 747 + /* Always process UNMAPs first so view SVM ranges is current */ 748 + err = xe_svm_garbage_collector(vm); 749 + if (err) 750 + return err; 751 + 752 + r = drm_gpusvm_range_find_or_insert(&vm->svm.gpusvm, fault_addr, 753 + xe_vma_start(vma), xe_vma_end(vma), 754 + &ctx); 755 + if (IS_ERR(r)) 756 + return PTR_ERR(r); 757 + 758 + range = to_xe_range(r); 759 + if (xe_svm_range_is_valid(range, tile)) 760 + return 0; 761 + 762 + range_debug(range, "PAGE FAULT"); 763 + 764 + /* XXX: Add migration policy, for now migrate range once */ 765 + if (!range->skip_migrate && range->base.flags.migrate_devmem && 766 + xe_svm_range_size(range) >= SZ_64K) { 767 + range->skip_migrate = true; 768 + 769 + err = xe_svm_alloc_vram(vm, tile, range, &ctx); 770 + if (err) { 771 + drm_dbg(&vm->xe->drm, 772 + "VRAM allocation failed, falling back to " 773 + "retrying fault, asid=%u, errno=%pe\n", 774 + vm->usm.asid, ERR_PTR(err)); 775 + goto retry; 776 + } 777 + } 778 + 779 + range_debug(range, "GET PAGES"); 780 + err = drm_gpusvm_range_get_pages(&vm->svm.gpusvm, r, &ctx); 781 + /* Corner where CPU mappings have changed */ 782 + if (err == -EOPNOTSUPP || err == -EFAULT || err == -EPERM) { 783 + if (err == -EOPNOTSUPP) { 784 + range_debug(range, "PAGE FAULT - EVICT PAGES"); 785 + drm_gpusvm_range_evict(&vm->svm.gpusvm, &range->base); 786 + } 787 + drm_dbg(&vm->xe->drm, 788 + "Get pages failed, falling back to retrying, asid=%u, gpusvm=%p, errno=%pe\n", 789 + vm->usm.asid, &vm->svm.gpusvm, ERR_PTR(err)); 790 + range_debug(range, "PAGE FAULT - RETRY PAGES"); 791 + goto retry; 792 + } 793 + if (err) { 794 + range_debug(range, "PAGE FAULT - FAIL PAGE COLLECT"); 795 + goto err_out; 796 + } 797 + 798 + range_debug(range, "PAGE FAULT - BIND"); 799 + 800 + retry_bind: 801 + drm_exec_init(&exec, 0, 0); 802 + drm_exec_until_all_locked(&exec) { 803 + err = drm_exec_lock_obj(&exec, vm->gpuvm.r_obj); 804 + drm_exec_retry_on_contention(&exec); 805 + if (err) { 806 + drm_exec_fini(&exec); 807 + goto err_out; 808 + } 809 + 810 + fence = xe_vm_range_rebind(vm, vma, range, BIT(tile->id)); 811 + if (IS_ERR(fence)) { 812 + drm_exec_fini(&exec); 813 + err = PTR_ERR(fence); 814 + if (err == -EAGAIN) { 815 + range_debug(range, "PAGE FAULT - RETRY BIND"); 816 + goto retry; 817 + } 818 + if (xe_vm_validate_should_retry(&exec, err, &end)) 819 + goto retry_bind; 820 + goto err_out; 821 + } 822 + } 823 + drm_exec_fini(&exec); 824 + 825 + if (xe_modparam.always_migrate_to_vram) 826 + range->skip_migrate = false; 827 + 828 + dma_fence_wait(fence, false); 829 + dma_fence_put(fence); 830 + 831 + err_out: 832 + 833 + return err; 834 + } 835 + 836 + /** 837 + * xe_svm_has_mapping() - SVM has mappings 838 + * @vm: The VM. 839 + * @start: Start address. 840 + * @end: End address. 841 + * 842 + * Check if an address range has SVM mappings. 843 + * 844 + * Return: True if address range has a SVM mapping, False otherwise 845 + */ 846 + bool xe_svm_has_mapping(struct xe_vm *vm, u64 start, u64 end) 847 + { 848 + return drm_gpusvm_has_mapping(&vm->svm.gpusvm, start, end); 849 + } 850 + 851 + /** 852 + * xe_svm_bo_evict() - SVM evict BO to system memory 853 + * @bo: BO to evict 854 + * 855 + * SVM evict BO to system memory. GPU SVM layer ensures all device pages 856 + * are evicted before returning. 857 + * 858 + * Return: 0 on success standard error code otherwise 859 + */ 860 + int xe_svm_bo_evict(struct xe_bo *bo) 861 + { 862 + return drm_gpusvm_evict_to_ram(&bo->devmem_allocation); 863 + } 864 + 865 + #if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR) 866 + static struct drm_pagemap_device_addr 867 + xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap, 868 + struct device *dev, 869 + struct page *page, 870 + unsigned int order, 871 + enum dma_data_direction dir) 872 + { 873 + struct device *pgmap_dev = dpagemap->dev; 874 + enum drm_interconnect_protocol prot; 875 + dma_addr_t addr; 876 + 877 + if (pgmap_dev == dev) { 878 + addr = xe_vram_region_page_to_dpa(page_to_vr(page), page); 879 + prot = XE_INTERCONNECT_VRAM; 880 + } else { 881 + addr = DMA_MAPPING_ERROR; 882 + prot = 0; 883 + } 884 + 885 + return drm_pagemap_device_addr_encode(addr, prot, order, dir); 886 + } 887 + 888 + static const struct drm_pagemap_ops xe_drm_pagemap_ops = { 889 + .device_map = xe_drm_pagemap_device_map, 890 + }; 891 + 892 + /** 893 + * xe_devm_add: Remap and provide memmap backing for device memory 894 + * @tile: tile that the memory region belongs to 895 + * @vr: vram memory region to remap 896 + * 897 + * This remap device memory to host physical address space and create 898 + * struct page to back device memory 899 + * 900 + * Return: 0 on success standard error code otherwise 901 + */ 902 + int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr) 903 + { 904 + struct xe_device *xe = tile_to_xe(tile); 905 + struct device *dev = &to_pci_dev(xe->drm.dev)->dev; 906 + struct resource *res; 907 + void *addr; 908 + int ret; 909 + 910 + res = devm_request_free_mem_region(dev, &iomem_resource, 911 + vr->usable_size); 912 + if (IS_ERR(res)) { 913 + ret = PTR_ERR(res); 914 + return ret; 915 + } 916 + 917 + vr->pagemap.type = MEMORY_DEVICE_PRIVATE; 918 + vr->pagemap.range.start = res->start; 919 + vr->pagemap.range.end = res->end; 920 + vr->pagemap.nr_range = 1; 921 + vr->pagemap.ops = drm_gpusvm_pagemap_ops_get(); 922 + vr->pagemap.owner = xe_svm_devm_owner(xe); 923 + addr = devm_memremap_pages(dev, &vr->pagemap); 924 + 925 + vr->dpagemap.dev = dev; 926 + vr->dpagemap.ops = &xe_drm_pagemap_ops; 927 + 928 + if (IS_ERR(addr)) { 929 + devm_release_mem_region(dev, res->start, resource_size(res)); 930 + ret = PTR_ERR(addr); 931 + drm_err(&xe->drm, "Failed to remap tile %d memory, errno %pe\n", 932 + tile->id, ERR_PTR(ret)); 933 + return ret; 934 + } 935 + vr->hpa_base = res->start; 936 + 937 + drm_dbg(&xe->drm, "Added tile %d memory [%llx-%llx] to devm, remapped to %pr\n", 938 + tile->id, vr->io_start, vr->io_start + vr->usable_size, res); 939 + return 0; 940 + } 941 + #else 942 + int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr) 943 + { 944 + return 0; 945 + } 946 + #endif

+150

drivers/gpu/drm/xe/xe_svm.h

··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2024 Intel Corporation 4 + */ 5 + 6 + #ifndef _XE_SVM_H_ 7 + #define _XE_SVM_H_ 8 + 9 + #include <drm/drm_pagemap.h> 10 + #include <drm/drm_gpusvm.h> 11 + 12 + #define XE_INTERCONNECT_VRAM DRM_INTERCONNECT_DRIVER 13 + 14 + struct xe_bo; 15 + struct xe_vram_region; 16 + struct xe_tile; 17 + struct xe_vm; 18 + struct xe_vma; 19 + 20 + /** struct xe_svm_range - SVM range */ 21 + struct xe_svm_range { 22 + /** @base: base drm_gpusvm_range */ 23 + struct drm_gpusvm_range base; 24 + /** 25 + * @garbage_collector_link: Link into VM's garbage collect SVM range 26 + * list. Protected by VM's garbage collect lock. 27 + */ 28 + struct list_head garbage_collector_link; 29 + /** 30 + * @tile_present: Tile mask of binding is present for this range. 31 + * Protected by GPU SVM notifier lock. 32 + */ 33 + u8 tile_present; 34 + /** 35 + * @tile_invalidated: Tile mask of binding is invalidated for this 36 + * range. Protected by GPU SVM notifier lock. 37 + */ 38 + u8 tile_invalidated; 39 + /** 40 + * @skip_migrate: Skip migration to VRAM, protected by GPU fault handler 41 + * locking. 42 + */ 43 + u8 skip_migrate :1; 44 + }; 45 + 46 + #if IS_ENABLED(CONFIG_DRM_GPUSVM) 47 + /** 48 + * xe_svm_range_pages_valid() - SVM range pages valid 49 + * @range: SVM range 50 + * 51 + * Return: True if SVM range pages are valid, False otherwise 52 + */ 53 + static inline bool xe_svm_range_pages_valid(struct xe_svm_range *range) 54 + { 55 + return drm_gpusvm_range_pages_valid(range->base.gpusvm, &range->base); 56 + } 57 + 58 + int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr); 59 + 60 + int xe_svm_init(struct xe_vm *vm); 61 + 62 + void xe_svm_fini(struct xe_vm *vm); 63 + 64 + void xe_svm_close(struct xe_vm *vm); 65 + 66 + int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma, 67 + struct xe_tile *tile, u64 fault_addr, 68 + bool atomic); 69 + 70 + bool xe_svm_has_mapping(struct xe_vm *vm, u64 start, u64 end); 71 + 72 + int xe_svm_bo_evict(struct xe_bo *bo); 73 + 74 + void xe_svm_range_debug(struct xe_svm_range *range, const char *operation); 75 + #else 76 + static inline bool xe_svm_range_pages_valid(struct xe_svm_range *range) 77 + { 78 + return false; 79 + } 80 + 81 + static inline 82 + int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr) 83 + { 84 + return 0; 85 + } 86 + 87 + static inline 88 + int xe_svm_init(struct xe_vm *vm) 89 + { 90 + return 0; 91 + } 92 + 93 + static inline 94 + void xe_svm_fini(struct xe_vm *vm) 95 + { 96 + } 97 + 98 + static inline 99 + void xe_svm_close(struct xe_vm *vm) 100 + { 101 + } 102 + 103 + static inline 104 + int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma, 105 + struct xe_tile *tile, u64 fault_addr, 106 + bool atomic) 107 + { 108 + return 0; 109 + } 110 + 111 + static inline 112 + bool xe_svm_has_mapping(struct xe_vm *vm, u64 start, u64 end) 113 + { 114 + return false; 115 + } 116 + 117 + static inline 118 + int xe_svm_bo_evict(struct xe_bo *bo) 119 + { 120 + return 0; 121 + } 122 + 123 + static inline 124 + void xe_svm_range_debug(struct xe_svm_range *range, const char *operation) 125 + { 126 + } 127 + #endif 128 + 129 + /** 130 + * xe_svm_range_has_dma_mapping() - SVM range has DMA mapping 131 + * @range: SVM range 132 + * 133 + * Return: True if SVM range has a DMA mapping, False otherwise 134 + */ 135 + static inline bool xe_svm_range_has_dma_mapping(struct xe_svm_range *range) 136 + { 137 + lockdep_assert_held(&range->base.gpusvm->notifier_lock); 138 + return range->base.flags.has_dma_mapping; 139 + } 140 + 141 + #define xe_svm_assert_in_notifier(vm__) \ 142 + lockdep_assert_held_write(&(vm__)->svm.gpusvm.notifier_lock) 143 + 144 + #define xe_svm_notifier_lock(vm__) \ 145 + drm_gpusvm_notifier_lock(&(vm__)->svm.gpusvm) 146 + 147 + #define xe_svm_notifier_unlock(vm__) \ 148 + drm_gpusvm_notifier_unlock(&(vm__)->svm.gpusvm) 149 + 150 + #endif

+5

drivers/gpu/drm/xe/xe_tile.c

··· 13 13 #include "xe_migrate.h" 14 14 #include "xe_pcode.h" 15 15 #include "xe_sa.h" 16 + #include "xe_svm.h" 16 17 #include "xe_tile.h" 17 18 #include "xe_tile_sysfs.h" 18 19 #include "xe_ttm_vram_mgr.h" ··· 161 160 */ 162 161 int xe_tile_init_noalloc(struct xe_tile *tile) 163 162 { 163 + struct xe_device *xe = tile_to_xe(tile); 164 164 int err; 165 165 166 166 err = tile_ttm_mgr_init(tile); ··· 169 167 return err; 170 168 171 169 xe_wa_apply_tile_workarounds(tile); 170 + 171 + if (xe->info.has_usm && IS_DGFX(xe)) 172 + xe_devm_add(tile, &tile->mem.vram); 172 173 173 174 return xe_tile_sysfs_init(tile); 174 175 }

+30

drivers/gpu/drm/xe/xe_trace.h

··· 427 427 TP_ARGS(xe, caller) 428 428 ); 429 429 430 + TRACE_EVENT(xe_eu_stall_data_read, 431 + TP_PROTO(u8 slice, u8 subslice, 432 + u32 read_ptr, u32 write_ptr, 433 + size_t read_size, size_t total_size), 434 + TP_ARGS(slice, subslice, 435 + read_ptr, write_ptr, 436 + read_size, total_size), 437 + 438 + TP_STRUCT__entry(__field(u8, slice) 439 + __field(u8, subslice) 440 + __field(u32, read_ptr) 441 + __field(u32, write_ptr) 442 + __field(size_t, read_size) 443 + __field(size_t, total_size) 444 + ), 445 + 446 + TP_fast_assign(__entry->slice = slice; 447 + __entry->subslice = subslice; 448 + __entry->read_ptr = read_ptr; 449 + __entry->write_ptr = write_ptr; 450 + __entry->read_size = read_size; 451 + __entry->total_size = total_size; 452 + ), 453 + 454 + TP_printk("slice: %u subslice: %u read ptr: 0x%x write ptr: 0x%x read size: %zu total read size: %zu", 455 + __entry->slice, __entry->subslice, 456 + __entry->read_ptr, __entry->write_ptr, 457 + __entry->read_size, __entry->total_size) 458 + ); 459 + 430 460 #endif 431 461 432 462 /* This part must be outside protection */

+49

drivers/gpu/drm/xe/xe_trace_guc.h

··· 14 14 15 15 #include "xe_device_types.h" 16 16 #include "xe_guc_exec_queue_types.h" 17 + #include "xe_guc_engine_activity_types.h" 17 18 18 19 #define __dev_name_xe(xe) dev_name((xe)->drm.dev) 19 20 ··· 101 100 102 101 ); 103 102 103 + TRACE_EVENT(xe_guc_engine_activity, 104 + TP_PROTO(struct xe_device *xe, struct engine_activity *ea, const char *name, 105 + u16 instance), 106 + TP_ARGS(xe, ea, name, instance), 107 + 108 + TP_STRUCT__entry( 109 + __string(dev, __dev_name_xe(xe)) 110 + __string(name, name) 111 + __field(u32, global_change_num) 112 + __field(u32, guc_tsc_frequency_hz) 113 + __field(u32, lag_latency_usec) 114 + __field(u16, instance) 115 + __field(u16, change_num) 116 + __field(u16, quanta_ratio) 117 + __field(u32, last_update_tick) 118 + __field(u64, active_ticks) 119 + __field(u64, active) 120 + __field(u64, total) 121 + __field(u64, quanta) 122 + __field(u64, last_cpu_ts) 123 + ), 124 + 125 + TP_fast_assign( 126 + __assign_str(dev); 127 + __assign_str(name); 128 + __entry->global_change_num = ea->metadata.global_change_num; 129 + __entry->guc_tsc_frequency_hz = ea->metadata.guc_tsc_frequency_hz; 130 + __entry->lag_latency_usec = ea->metadata.lag_latency_usec; 131 + __entry->instance = instance; 132 + __entry->change_num = ea->activity.change_num; 133 + __entry->quanta_ratio = ea->activity.quanta_ratio; 134 + __entry->last_update_tick = ea->activity.last_update_tick; 135 + __entry->active_ticks = ea->activity.active_ticks; 136 + __entry->active = ea->active; 137 + __entry->total = ea->total; 138 + __entry->quanta = ea->quanta; 139 + __entry->last_cpu_ts = ea->last_cpu_ts; 140 + ), 141 + 142 + TP_printk("dev=%s engine %s:%d Active=%llu, quanta=%llu, last_cpu_ts=%llu\n" 143 + "Activity metadata: global_change_num=%u, guc_tsc_frequency_hz=%u lag_latency_usec=%u\n" 144 + "Activity data: change_num=%u, quanta_ratio=0x%x, last_update_tick=%u, active_ticks=%llu\n", 145 + __get_str(dev), __get_str(name), __entry->instance, 146 + (__entry->active + __entry->total), __entry->quanta, __entry->last_cpu_ts, 147 + __entry->global_change_num, __entry->guc_tsc_frequency_hz, 148 + __entry->lag_latency_usec, __entry->change_num, __entry->quanta_ratio, 149 + __entry->last_update_tick, __entry->active_ticks) 150 + ); 104 151 #endif 105 152 106 153 /* This part must be outside protection */

+64 -8

drivers/gpu/drm/xe/xe_tuning.c

··· 7 7 8 8 #include <kunit/visibility.h> 9 9 10 + #include <drm/drm_managed.h> 11 + 10 12 #include "regs/xe_gt_regs.h" 11 13 #include "xe_gt_types.h" 12 14 #include "xe_platform_types.h" ··· 90 88 }; 91 89 92 90 static const struct xe_rtp_entry_sr engine_tunings[] = { 91 + { XE_RTP_NAME("Tuning: L3 Hashing Mask"), 92 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), 93 + FUNC(xe_rtp_match_first_render_or_compute)), 94 + XE_RTP_ACTIONS(CLR(XELP_GARBCNTL, XELP_BUS_HASH_CTL_BIT_EXC)) 95 + }, 93 96 { XE_RTP_NAME("Tuning: Set Indirect State Override"), 94 97 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1274), 95 98 ENGINE_CLASS(RENDER)), ··· 104 97 }; 105 98 106 99 static const struct xe_rtp_entry_sr lrc_tunings[] = { 107 - { XE_RTP_NAME("Tuning: ganged timer, also known as 16011163337"), 108 - XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)), 109 - /* read verification is ignored due to 1608008084. */ 110 - XE_RTP_ACTIONS(FIELD_SET_NO_READ_MASK(FF_MODE2, 111 - FF_MODE2_GS_TIMER_MASK, 112 - FF_MODE2_GS_TIMER_224)) 113 - }, 114 - 115 100 /* DG2 */ 116 101 117 102 { XE_RTP_NAME("Tuning: L3 cache"), ··· 142 143 {} 143 144 }; 144 145 146 + /** 147 + * xe_tuning_init - initialize gt with tunings bookkeeping 148 + * @gt: GT instance to initialize 149 + * 150 + * Returns 0 for success, negative error code otherwise. 151 + */ 152 + int xe_tuning_init(struct xe_gt *gt) 153 + { 154 + struct xe_device *xe = gt_to_xe(gt); 155 + size_t n_lrc, n_engine, n_gt, total; 156 + unsigned long *p; 157 + 158 + n_gt = BITS_TO_LONGS(ARRAY_SIZE(gt_tunings)); 159 + n_engine = BITS_TO_LONGS(ARRAY_SIZE(engine_tunings)); 160 + n_lrc = BITS_TO_LONGS(ARRAY_SIZE(lrc_tunings)); 161 + total = n_gt + n_engine + n_lrc; 162 + 163 + p = drmm_kzalloc(&xe->drm, sizeof(*p) * total, GFP_KERNEL); 164 + if (!p) 165 + return -ENOMEM; 166 + 167 + gt->tuning_active.gt = p; 168 + p += n_gt; 169 + gt->tuning_active.engine = p; 170 + p += n_engine; 171 + gt->tuning_active.lrc = p; 172 + 173 + return 0; 174 + } 175 + ALLOW_ERROR_INJECTION(xe_tuning_init, ERRNO); /* See xe_pci_probe() */ 176 + 145 177 void xe_tuning_process_gt(struct xe_gt *gt) 146 178 { 147 179 struct xe_rtp_process_ctx ctx = XE_RTP_PROCESS_CTX_INITIALIZER(gt); 148 180 181 + xe_rtp_process_ctx_enable_active_tracking(&ctx, 182 + gt->tuning_active.gt, 183 + ARRAY_SIZE(gt_tunings)); 149 184 xe_rtp_process_to_sr(&ctx, gt_tunings, &gt->reg_sr); 150 185 } 151 186 EXPORT_SYMBOL_IF_KUNIT(xe_tuning_process_gt); ··· 188 155 { 189 156 struct xe_rtp_process_ctx ctx = XE_RTP_PROCESS_CTX_INITIALIZER(hwe); 190 157 158 + xe_rtp_process_ctx_enable_active_tracking(&ctx, 159 + hwe->gt->tuning_active.engine, 160 + ARRAY_SIZE(engine_tunings)); 191 161 xe_rtp_process_to_sr(&ctx, engine_tunings, &hwe->reg_sr); 192 162 } 193 163 EXPORT_SYMBOL_IF_KUNIT(xe_tuning_process_engine); ··· 207 171 { 208 172 struct xe_rtp_process_ctx ctx = XE_RTP_PROCESS_CTX_INITIALIZER(hwe); 209 173 174 + xe_rtp_process_ctx_enable_active_tracking(&ctx, 175 + hwe->gt->tuning_active.lrc, 176 + ARRAY_SIZE(lrc_tunings)); 210 177 xe_rtp_process_to_sr(&ctx, lrc_tunings, &hwe->reg_lrc); 178 + } 179 + 180 + void xe_tuning_dump(struct xe_gt *gt, struct drm_printer *p) 181 + { 182 + size_t idx; 183 + 184 + drm_printf(p, "GT Tunings\n"); 185 + for_each_set_bit(idx, gt->tuning_active.gt, ARRAY_SIZE(gt_tunings)) 186 + drm_printf_indent(p, 1, "%s\n", gt_tunings[idx].name); 187 + 188 + drm_printf(p, "\nEngine Tunings\n"); 189 + for_each_set_bit(idx, gt->tuning_active.engine, ARRAY_SIZE(engine_tunings)) 190 + drm_printf_indent(p, 1, "%s\n", engine_tunings[idx].name); 191 + 192 + drm_printf(p, "\nLRC Tunings\n"); 193 + for_each_set_bit(idx, gt->tuning_active.lrc, ARRAY_SIZE(lrc_tunings)) 194 + drm_printf_indent(p, 1, "%s\n", lrc_tunings[idx].name); 211 195 }

+3

drivers/gpu/drm/xe/xe_tuning.h

··· 6 6 #ifndef _XE_TUNING_ 7 7 #define _XE_TUNING_ 8 8 9 + struct drm_printer; 9 10 struct xe_gt; 10 11 struct xe_hw_engine; 11 12 13 + int xe_tuning_init(struct xe_gt *gt); 12 14 void xe_tuning_process_gt(struct xe_gt *gt); 13 15 void xe_tuning_process_engine(struct xe_hw_engine *hwe); 14 16 void xe_tuning_process_lrc(struct xe_hw_engine *hwe); 17 + void xe_tuning_dump(struct xe_gt *gt, struct drm_printer *p); 15 18 16 19 #endif

+3

drivers/gpu/drm/xe/xe_uc.c

··· 14 14 #include "xe_gt_sriov_vf.h" 15 15 #include "xe_guc.h" 16 16 #include "xe_guc_pc.h" 17 + #include "xe_guc_engine_activity.h" 17 18 #include "xe_huc.h" 18 19 #include "xe_sriov.h" 19 20 #include "xe_uc_fw.h" ··· 210 209 ret = xe_guc_pc_start(&uc->guc.pc); 211 210 if (ret) 212 211 return ret; 212 + 213 + xe_guc_engine_activity_enable_stats(&uc->guc); 213 214 214 215 /* We don't fail the driver load if HuC fails to auth, but let's warn */ 215 216 ret = xe_huc_auth(&uc->huc, XE_HUC_AUTH_VIA_GUC);

+427 -94

drivers/gpu/drm/xe/xe_vm.c

··· 8 8 #include <linux/dma-fence-array.h> 9 9 #include <linux/nospec.h> 10 10 11 + #include <drm/drm_drv.h> 11 12 #include <drm/drm_exec.h> 12 13 #include <drm/drm_print.h> 13 14 #include <drm/ttm/ttm_tt.h> ··· 36 35 #include "xe_pt.h" 37 36 #include "xe_pxp.h" 38 37 #include "xe_res_cursor.h" 38 + #include "xe_svm.h" 39 39 #include "xe_sync.h" 40 40 #include "xe_trace_bo.h" 41 41 #include "xe_wa.h" ··· 272 270 273 271 return err; 274 272 } 273 + ALLOW_ERROR_INJECTION(xe_vm_add_compute_exec_queue, ERRNO); 275 274 276 275 /** 277 276 * xe_vm_remove_compute_exec_queue() - Remove compute exec queue from VM ··· 583 580 trace_xe_vm_rebind_worker_exit(vm); 584 581 } 585 582 586 - static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni, 587 - const struct mmu_notifier_range *range, 588 - unsigned long cur_seq) 583 + static void __vma_userptr_invalidate(struct xe_vm *vm, struct xe_userptr_vma *uvma) 589 584 { 590 - struct xe_userptr *userptr = container_of(mni, typeof(*userptr), notifier); 591 - struct xe_userptr_vma *uvma = container_of(userptr, typeof(*uvma), userptr); 585 + struct xe_userptr *userptr = &uvma->userptr; 592 586 struct xe_vma *vma = &uvma->vma; 593 - struct xe_vm *vm = xe_vma_vm(vma); 594 587 struct dma_resv_iter cursor; 595 588 struct dma_fence *fence; 596 589 long err; 597 - 598 - xe_assert(vm->xe, xe_vma_is_userptr(vma)); 599 - trace_xe_vma_userptr_invalidate(vma); 600 - 601 - if (!mmu_notifier_range_blockable(range)) 602 - return false; 603 - 604 - vm_dbg(&xe_vma_vm(vma)->xe->drm, 605 - "NOTIFIER: addr=0x%016llx, range=0x%016llx", 606 - xe_vma_start(vma), xe_vma_size(vma)); 607 - 608 - down_write(&vm->userptr.notifier_lock); 609 - mmu_interval_set_seq(mni, cur_seq); 610 - 611 - /* No need to stop gpu access if the userptr is not yet bound. */ 612 - if (!userptr->initial_bind) { 613 - up_write(&vm->userptr.notifier_lock); 614 - return true; 615 - } 616 590 617 591 /* 618 592 * Tell exec and rebind worker they need to repin and rebind this 619 593 * userptr. 620 594 */ 621 595 if (!xe_vm_in_fault_mode(vm) && 622 - !(vma->gpuva.flags & XE_VMA_DESTROYED) && vma->tile_present) { 596 + !(vma->gpuva.flags & XE_VMA_DESTROYED)) { 623 597 spin_lock(&vm->userptr.invalidated_lock); 624 598 list_move_tail(&userptr->invalidate_link, 625 599 &vm->userptr.invalidated); 626 600 spin_unlock(&vm->userptr.invalidated_lock); 627 601 } 628 - 629 - up_write(&vm->userptr.notifier_lock); 630 602 631 603 /* 632 604 * Preempt fences turn into schedule disables, pipeline these. ··· 620 642 false, MAX_SCHEDULE_TIMEOUT); 621 643 XE_WARN_ON(err <= 0); 622 644 623 - if (xe_vm_in_fault_mode(vm)) { 645 + if (xe_vm_in_fault_mode(vm) && userptr->initial_bind) { 624 646 err = xe_vm_invalidate_vma(vma); 625 647 XE_WARN_ON(err); 626 648 } 627 649 650 + xe_hmm_userptr_unmap(uvma); 651 + } 652 + 653 + static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni, 654 + const struct mmu_notifier_range *range, 655 + unsigned long cur_seq) 656 + { 657 + struct xe_userptr_vma *uvma = container_of(mni, typeof(*uvma), userptr.notifier); 658 + struct xe_vma *vma = &uvma->vma; 659 + struct xe_vm *vm = xe_vma_vm(vma); 660 + 661 + xe_assert(vm->xe, xe_vma_is_userptr(vma)); 662 + trace_xe_vma_userptr_invalidate(vma); 663 + 664 + if (!mmu_notifier_range_blockable(range)) 665 + return false; 666 + 667 + vm_dbg(&xe_vma_vm(vma)->xe->drm, 668 + "NOTIFIER: addr=0x%016llx, range=0x%016llx", 669 + xe_vma_start(vma), xe_vma_size(vma)); 670 + 671 + down_write(&vm->userptr.notifier_lock); 672 + mmu_interval_set_seq(mni, cur_seq); 673 + 674 + __vma_userptr_invalidate(vm, uvma); 675 + up_write(&vm->userptr.notifier_lock); 628 676 trace_xe_vma_userptr_invalidate_complete(vma); 629 677 630 678 return true; ··· 660 656 .invalidate = vma_userptr_invalidate, 661 657 }; 662 658 659 + #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT) 660 + /** 661 + * xe_vma_userptr_force_invalidate() - force invalidate a userptr 662 + * @uvma: The userptr vma to invalidate 663 + * 664 + * Perform a forced userptr invalidation for testing purposes. 665 + */ 666 + void xe_vma_userptr_force_invalidate(struct xe_userptr_vma *uvma) 667 + { 668 + struct xe_vm *vm = xe_vma_vm(&uvma->vma); 669 + 670 + /* Protect against concurrent userptr pinning */ 671 + lockdep_assert_held(&vm->lock); 672 + /* Protect against concurrent notifiers */ 673 + lockdep_assert_held(&vm->userptr.notifier_lock); 674 + /* 675 + * Protect against concurrent instances of this function and 676 + * the critical exec sections 677 + */ 678 + xe_vm_assert_held(vm); 679 + 680 + if (!mmu_interval_read_retry(&uvma->userptr.notifier, 681 + uvma->userptr.notifier_seq)) 682 + uvma->userptr.notifier_seq -= 2; 683 + __vma_userptr_invalidate(vm, uvma); 684 + } 685 + #endif 686 + 663 687 int xe_vm_userptr_pin(struct xe_vm *vm) 664 688 { 665 689 struct xe_userptr_vma *uvma, *next; 666 690 int err = 0; 667 - LIST_HEAD(tmp_evict); 668 691 669 692 xe_assert(vm->xe, !xe_vm_in_fault_mode(vm)); 670 693 lockdep_assert_held_write(&vm->lock); 671 694 672 695 /* Collect invalidated userptrs */ 673 696 spin_lock(&vm->userptr.invalidated_lock); 697 + xe_assert(vm->xe, list_empty(&vm->userptr.repin_list)); 674 698 list_for_each_entry_safe(uvma, next, &vm->userptr.invalidated, 675 699 userptr.invalidate_link) { 676 700 list_del_init(&uvma->userptr.invalidate_link); 677 - list_move_tail(&uvma->userptr.repin_link, 678 - &vm->userptr.repin_list); 701 + list_add_tail(&uvma->userptr.repin_link, 702 + &vm->userptr.repin_list); 679 703 } 680 704 spin_unlock(&vm->userptr.invalidated_lock); 681 705 682 - /* Pin and move to temporary list */ 706 + /* Pin and move to bind list */ 683 707 list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list, 684 708 userptr.repin_link) { 685 709 err = xe_vma_userptr_pin_pages(uvma); 686 710 if (err == -EFAULT) { 687 711 list_del_init(&uvma->userptr.repin_link); 712 + /* 713 + * We might have already done the pin once already, but 714 + * then had to retry before the re-bind happened, due 715 + * some other condition in the caller, but in the 716 + * meantime the userptr got dinged by the notifier such 717 + * that we need to revalidate here, but this time we hit 718 + * the EFAULT. In such a case make sure we remove 719 + * ourselves from the rebind list to avoid going down in 720 + * flames. 721 + */ 722 + if (!list_empty(&uvma->vma.combined_links.rebind)) 723 + list_del_init(&uvma->vma.combined_links.rebind); 688 724 689 725 /* Wait for pending binds */ 690 726 xe_vm_lock(vm, false); ··· 735 691 err = xe_vm_invalidate_vma(&uvma->vma); 736 692 xe_vm_unlock(vm); 737 693 if (err) 738 - return err; 694 + break; 739 695 } else { 740 - if (err < 0) 741 - return err; 696 + if (err) 697 + break; 742 698 743 699 list_del_init(&uvma->userptr.repin_link); 744 700 list_move_tail(&uvma->vma.combined_links.rebind, ··· 746 702 } 747 703 } 748 704 749 - return 0; 705 + if (err) { 706 + down_write(&vm->userptr.notifier_lock); 707 + spin_lock(&vm->userptr.invalidated_lock); 708 + list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list, 709 + userptr.repin_link) { 710 + list_del_init(&uvma->userptr.repin_link); 711 + list_move_tail(&uvma->userptr.invalidate_link, 712 + &vm->userptr.invalidated); 713 + } 714 + spin_unlock(&vm->userptr.invalidated_lock); 715 + up_write(&vm->userptr.notifier_lock); 716 + } 717 + return err; 750 718 } 751 719 752 720 /** ··· 950 894 return fence; 951 895 } 952 896 897 + static void xe_vm_populate_range_rebind(struct xe_vma_op *op, 898 + struct xe_vma *vma, 899 + struct xe_svm_range *range, 900 + u8 tile_mask) 901 + { 902 + INIT_LIST_HEAD(&op->link); 903 + op->tile_mask = tile_mask; 904 + op->base.op = DRM_GPUVA_OP_DRIVER; 905 + op->subop = XE_VMA_SUBOP_MAP_RANGE; 906 + op->map_range.vma = vma; 907 + op->map_range.range = range; 908 + } 909 + 910 + static int 911 + xe_vm_ops_add_range_rebind(struct xe_vma_ops *vops, 912 + struct xe_vma *vma, 913 + struct xe_svm_range *range, 914 + u8 tile_mask) 915 + { 916 + struct xe_vma_op *op; 917 + 918 + op = kzalloc(sizeof(*op), GFP_KERNEL); 919 + if (!op) 920 + return -ENOMEM; 921 + 922 + xe_vm_populate_range_rebind(op, vma, range, tile_mask); 923 + list_add_tail(&op->link, &vops->list); 924 + xe_vma_ops_incr_pt_update_ops(vops, tile_mask); 925 + 926 + return 0; 927 + } 928 + 929 + /** 930 + * xe_vm_range_rebind() - VM range (re)bind 931 + * @vm: The VM which the range belongs to. 932 + * @vma: The VMA which the range belongs to. 933 + * @range: SVM range to rebind. 934 + * @tile_mask: Tile mask to bind the range to. 935 + * 936 + * (re)bind SVM range setting up GPU page tables for the range. 937 + * 938 + * Return: dma fence for rebind to signal completion on succees, ERR_PTR on 939 + * failure 940 + */ 941 + struct dma_fence *xe_vm_range_rebind(struct xe_vm *vm, 942 + struct xe_vma *vma, 943 + struct xe_svm_range *range, 944 + u8 tile_mask) 945 + { 946 + struct dma_fence *fence = NULL; 947 + struct xe_vma_ops vops; 948 + struct xe_vma_op *op, *next_op; 949 + struct xe_tile *tile; 950 + u8 id; 951 + int err; 952 + 953 + lockdep_assert_held(&vm->lock); 954 + xe_vm_assert_held(vm); 955 + xe_assert(vm->xe, xe_vm_in_fault_mode(vm)); 956 + xe_assert(vm->xe, xe_vma_is_cpu_addr_mirror(vma)); 957 + 958 + xe_vma_ops_init(&vops, vm, NULL, NULL, 0); 959 + for_each_tile(tile, vm->xe, id) { 960 + vops.pt_update_ops[id].wait_vm_bookkeep = true; 961 + vops.pt_update_ops[tile->id].q = 962 + xe_tile_migrate_exec_queue(tile); 963 + } 964 + 965 + err = xe_vm_ops_add_range_rebind(&vops, vma, range, tile_mask); 966 + if (err) 967 + return ERR_PTR(err); 968 + 969 + err = xe_vma_ops_alloc(&vops, false); 970 + if (err) { 971 + fence = ERR_PTR(err); 972 + goto free_ops; 973 + } 974 + 975 + fence = ops_execute(vm, &vops); 976 + 977 + free_ops: 978 + list_for_each_entry_safe(op, next_op, &vops.list, link) { 979 + list_del(&op->link); 980 + kfree(op); 981 + } 982 + xe_vma_ops_fini(&vops); 983 + 984 + return fence; 985 + } 986 + 987 + static void xe_vm_populate_range_unbind(struct xe_vma_op *op, 988 + struct xe_svm_range *range) 989 + { 990 + INIT_LIST_HEAD(&op->link); 991 + op->tile_mask = range->tile_present; 992 + op->base.op = DRM_GPUVA_OP_DRIVER; 993 + op->subop = XE_VMA_SUBOP_UNMAP_RANGE; 994 + op->unmap_range.range = range; 995 + } 996 + 997 + static int 998 + xe_vm_ops_add_range_unbind(struct xe_vma_ops *vops, 999 + struct xe_svm_range *range) 1000 + { 1001 + struct xe_vma_op *op; 1002 + 1003 + op = kzalloc(sizeof(*op), GFP_KERNEL); 1004 + if (!op) 1005 + return -ENOMEM; 1006 + 1007 + xe_vm_populate_range_unbind(op, range); 1008 + list_add_tail(&op->link, &vops->list); 1009 + xe_vma_ops_incr_pt_update_ops(vops, range->tile_present); 1010 + 1011 + return 0; 1012 + } 1013 + 1014 + /** 1015 + * xe_vm_range_unbind() - VM range unbind 1016 + * @vm: The VM which the range belongs to. 1017 + * @range: SVM range to rebind. 1018 + * 1019 + * Unbind SVM range removing the GPU page tables for the range. 1020 + * 1021 + * Return: dma fence for unbind to signal completion on succees, ERR_PTR on 1022 + * failure 1023 + */ 1024 + struct dma_fence *xe_vm_range_unbind(struct xe_vm *vm, 1025 + struct xe_svm_range *range) 1026 + { 1027 + struct dma_fence *fence = NULL; 1028 + struct xe_vma_ops vops; 1029 + struct xe_vma_op *op, *next_op; 1030 + struct xe_tile *tile; 1031 + u8 id; 1032 + int err; 1033 + 1034 + lockdep_assert_held(&vm->lock); 1035 + xe_vm_assert_held(vm); 1036 + xe_assert(vm->xe, xe_vm_in_fault_mode(vm)); 1037 + 1038 + if (!range->tile_present) 1039 + return dma_fence_get_stub(); 1040 + 1041 + xe_vma_ops_init(&vops, vm, NULL, NULL, 0); 1042 + for_each_tile(tile, vm->xe, id) { 1043 + vops.pt_update_ops[id].wait_vm_bookkeep = true; 1044 + vops.pt_update_ops[tile->id].q = 1045 + xe_tile_migrate_exec_queue(tile); 1046 + } 1047 + 1048 + err = xe_vm_ops_add_range_unbind(&vops, range); 1049 + if (err) 1050 + return ERR_PTR(err); 1051 + 1052 + err = xe_vma_ops_alloc(&vops, false); 1053 + if (err) { 1054 + fence = ERR_PTR(err); 1055 + goto free_ops; 1056 + } 1057 + 1058 + fence = ops_execute(vm, &vops); 1059 + 1060 + free_ops: 1061 + list_for_each_entry_safe(op, next_op, &vops.list, link) { 1062 + list_del(&op->link); 1063 + kfree(op); 1064 + } 1065 + xe_vma_ops_fini(&vops); 1066 + 1067 + return fence; 1068 + } 1069 + 953 1070 static void xe_vma_free(struct xe_vma *vma) 954 1071 { 955 1072 if (xe_vma_is_userptr(vma)) ··· 1131 902 kfree(vma); 1132 903 } 1133 904 1134 - #define VMA_CREATE_FLAG_READ_ONLY BIT(0) 1135 - #define VMA_CREATE_FLAG_IS_NULL BIT(1) 1136 - #define VMA_CREATE_FLAG_DUMPABLE BIT(2) 905 + #define VMA_CREATE_FLAG_READ_ONLY BIT(0) 906 + #define VMA_CREATE_FLAG_IS_NULL BIT(1) 907 + #define VMA_CREATE_FLAG_DUMPABLE BIT(2) 908 + #define VMA_CREATE_FLAG_IS_SYSTEM_ALLOCATOR BIT(3) 1137 909 1138 910 static struct xe_vma *xe_vma_create(struct xe_vm *vm, 1139 911 struct xe_bo *bo, ··· 1148 918 bool read_only = (flags & VMA_CREATE_FLAG_READ_ONLY); 1149 919 bool is_null = (flags & VMA_CREATE_FLAG_IS_NULL); 1150 920 bool dumpable = (flags & VMA_CREATE_FLAG_DUMPABLE); 921 + bool is_cpu_addr_mirror = 922 + (flags & VMA_CREATE_FLAG_IS_SYSTEM_ALLOCATOR); 1151 923 1152 924 xe_assert(vm->xe, start < end); 1153 925 xe_assert(vm->xe, end < vm->size); ··· 1158 926 * Allocate and ensure that the xe_vma_is_userptr() return 1159 927 * matches what was allocated. 1160 928 */ 1161 - if (!bo && !is_null) { 929 + if (!bo && !is_null && !is_cpu_addr_mirror) { 1162 930 struct xe_userptr_vma *uvma = kzalloc(sizeof(*uvma), GFP_KERNEL); 1163 931 1164 932 if (!uvma) ··· 1170 938 if (!vma) 1171 939 return ERR_PTR(-ENOMEM); 1172 940 941 + if (is_cpu_addr_mirror) 942 + vma->gpuva.flags |= XE_VMA_SYSTEM_ALLOCATOR; 1173 943 if (is_null) 1174 944 vma->gpuva.flags |= DRM_GPUVA_SPARSE; 1175 945 if (bo) ··· 1214 980 drm_gpuva_link(&vma->gpuva, vm_bo); 1215 981 drm_gpuvm_bo_put(vm_bo); 1216 982 } else /* userptr or null */ { 1217 - if (!is_null) { 983 + if (!is_null && !is_cpu_addr_mirror) { 1218 984 struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr; 1219 985 u64 size = end - start + 1; 1220 986 int err; ··· 1222 988 INIT_LIST_HEAD(&userptr->invalidate_link); 1223 989 INIT_LIST_HEAD(&userptr->repin_link); 1224 990 vma->gpuva.gem.offset = bo_offset_or_userptr; 991 + mutex_init(&userptr->unmap_mutex); 1225 992 1226 993 err = mmu_interval_notifier_insert(&userptr->notifier, 1227 994 current->mm, ··· 1264 1029 * them anymore 1265 1030 */ 1266 1031 mmu_interval_notifier_remove(&userptr->notifier); 1032 + mutex_destroy(&userptr->unmap_mutex); 1267 1033 xe_vm_put(vm); 1268 - } else if (xe_vma_is_null(vma)) { 1034 + } else if (xe_vma_is_null(vma) || xe_vma_is_cpu_addr_mirror(vma)) { 1269 1035 xe_vm_put(vm); 1270 1036 } else { 1271 1037 xe_bo_put(xe_vma_bo(vma)); ··· 1303 1067 xe_assert(vm->xe, vma->gpuva.flags & XE_VMA_DESTROYED); 1304 1068 1305 1069 spin_lock(&vm->userptr.invalidated_lock); 1070 + xe_assert(vm->xe, list_empty(&to_userptr_vma(vma)->userptr.repin_link)); 1306 1071 list_del(&to_userptr_vma(vma)->userptr.invalidate_link); 1307 1072 spin_unlock(&vm->userptr.invalidated_lock); 1308 - } else if (!xe_vma_is_null(vma)) { 1073 + } else if (!xe_vma_is_null(vma) && !xe_vma_is_cpu_addr_mirror(vma)) { 1309 1074 xe_bo_assert_held(xe_vma_bo(vma)); 1310 1075 1311 1076 drm_gpuva_unlink(&vma->gpuva); ··· 1757 1520 } 1758 1521 } 1759 1522 1523 + if (flags & XE_VM_FLAG_FAULT_MODE) { 1524 + err = xe_svm_init(vm); 1525 + if (err) 1526 + goto err_close; 1527 + } 1528 + 1760 1529 if (number_tiles > 1) 1761 1530 vm->composite_fence_ctx = dma_fence_context_alloc(1); 1762 1531 ··· 1789 1546 1790 1547 static void xe_vm_close(struct xe_vm *vm) 1791 1548 { 1549 + struct xe_device *xe = vm->xe; 1550 + bool bound; 1551 + int idx; 1552 + 1553 + bound = drm_dev_enter(&xe->drm, &idx); 1554 + 1792 1555 down_write(&vm->lock); 1556 + if (xe_vm_in_fault_mode(vm)) 1557 + xe_svm_notifier_lock(vm); 1558 + 1793 1559 vm->size = 0; 1560 + 1561 + if (!((vm->flags & XE_VM_FLAG_MIGRATION))) { 1562 + struct xe_tile *tile; 1563 + struct xe_gt *gt; 1564 + u8 id; 1565 + 1566 + /* Wait for pending binds */ 1567 + dma_resv_wait_timeout(xe_vm_resv(vm), 1568 + DMA_RESV_USAGE_BOOKKEEP, 1569 + false, MAX_SCHEDULE_TIMEOUT); 1570 + 1571 + if (bound) { 1572 + for_each_tile(tile, xe, id) 1573 + if (vm->pt_root[id]) 1574 + xe_pt_clear(xe, vm->pt_root[id]); 1575 + 1576 + for_each_gt(gt, xe, id) 1577 + xe_gt_tlb_invalidation_vm(gt, vm); 1578 + } 1579 + } 1580 + 1581 + if (xe_vm_in_fault_mode(vm)) 1582 + xe_svm_notifier_unlock(vm); 1794 1583 up_write(&vm->lock); 1584 + 1585 + if (bound) 1586 + drm_dev_exit(idx); 1795 1587 } 1796 1588 1797 1589 void xe_vm_close_and_put(struct xe_vm *vm) ··· 1843 1565 xe_vm_close(vm); 1844 1566 if (xe_vm_in_preempt_fence_mode(vm)) 1845 1567 flush_work(&vm->preempt.rebind_work); 1568 + if (xe_vm_in_fault_mode(vm)) 1569 + xe_svm_close(vm); 1846 1570 1847 1571 down_write(&vm->lock); 1848 1572 for_each_tile(tile, xe, id) { ··· 1912 1632 list_del_init(&vma->combined_links.destroy); 1913 1633 xe_vma_destroy_unlocked(vma); 1914 1634 } 1635 + 1636 + if (xe_vm_in_fault_mode(vm)) 1637 + xe_svm_fini(vm); 1915 1638 1916 1639 up_write(&vm->lock); 1917 1640 ··· 2272 1989 op->map.read_only = 2273 1990 flags & DRM_XE_VM_BIND_FLAG_READONLY; 2274 1991 op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL; 1992 + op->map.is_cpu_addr_mirror = flags & 1993 + DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR; 2275 1994 op->map.dumpable = flags & DRM_XE_VM_BIND_FLAG_DUMPABLE; 2276 1995 op->map.pat_index = pat_index; 2277 1996 } else if (__op->op == DRM_GPUVA_OP_PREFETCH) { ··· 2466 2181 VMA_CREATE_FLAG_IS_NULL : 0; 2467 2182 flags |= op->map.dumpable ? 2468 2183 VMA_CREATE_FLAG_DUMPABLE : 0; 2184 + flags |= op->map.is_cpu_addr_mirror ? 2185 + VMA_CREATE_FLAG_IS_SYSTEM_ALLOCATOR : 0; 2469 2186 2470 2187 vma = new_vma(vm, &op->base.map, op->map.pat_index, 2471 2188 flags); ··· 2475 2188 return PTR_ERR(vma); 2476 2189 2477 2190 op->map.vma = vma; 2478 - if (op->map.immediate || !xe_vm_in_fault_mode(vm)) 2191 + if ((op->map.immediate || !xe_vm_in_fault_mode(vm)) && 2192 + !op->map.is_cpu_addr_mirror) 2479 2193 xe_vma_ops_incr_pt_update_ops(vops, 2480 2194 op->tile_mask); 2481 2195 break; ··· 2485 2197 { 2486 2198 struct xe_vma *old = 2487 2199 gpuva_to_vma(op->base.remap.unmap->va); 2200 + bool skip = xe_vma_is_cpu_addr_mirror(old); 2201 + u64 start = xe_vma_start(old), end = xe_vma_end(old); 2202 + 2203 + if (op->base.remap.prev) 2204 + start = op->base.remap.prev->va.addr + 2205 + op->base.remap.prev->va.range; 2206 + if (op->base.remap.next) 2207 + end = op->base.remap.next->va.addr; 2208 + 2209 + if (xe_vma_is_cpu_addr_mirror(old) && 2210 + xe_svm_has_mapping(vm, start, end)) 2211 + return -EBUSY; 2488 2212 2489 2213 op->remap.start = xe_vma_start(old); 2490 2214 op->remap.range = xe_vma_size(old); 2491 2215 2492 - if (op->base.remap.prev) { 2493 - flags |= op->base.remap.unmap->va->flags & 2494 - XE_VMA_READ_ONLY ? 2495 - VMA_CREATE_FLAG_READ_ONLY : 0; 2496 - flags |= op->base.remap.unmap->va->flags & 2497 - DRM_GPUVA_SPARSE ? 2498 - VMA_CREATE_FLAG_IS_NULL : 0; 2499 - flags |= op->base.remap.unmap->va->flags & 2500 - XE_VMA_DUMPABLE ? 2501 - VMA_CREATE_FLAG_DUMPABLE : 0; 2216 + flags |= op->base.remap.unmap->va->flags & 2217 + XE_VMA_READ_ONLY ? 2218 + VMA_CREATE_FLAG_READ_ONLY : 0; 2219 + flags |= op->base.remap.unmap->va->flags & 2220 + DRM_GPUVA_SPARSE ? 2221 + VMA_CREATE_FLAG_IS_NULL : 0; 2222 + flags |= op->base.remap.unmap->va->flags & 2223 + XE_VMA_DUMPABLE ? 2224 + VMA_CREATE_FLAG_DUMPABLE : 0; 2225 + flags |= xe_vma_is_cpu_addr_mirror(old) ? 2226 + VMA_CREATE_FLAG_IS_SYSTEM_ALLOCATOR : 0; 2502 2227 2228 + if (op->base.remap.prev) { 2503 2229 vma = new_vma(vm, op->base.remap.prev, 2504 2230 old->pat_index, flags); 2505 2231 if (IS_ERR(vma)) ··· 2525 2223 * Userptr creates a new SG mapping so 2526 2224 * we must also rebind. 2527 2225 */ 2528 - op->remap.skip_prev = !xe_vma_is_userptr(old) && 2226 + op->remap.skip_prev = skip || 2227 + (!xe_vma_is_userptr(old) && 2529 2228 IS_ALIGNED(xe_vma_end(vma), 2530 - xe_vma_max_pte_size(old)); 2229 + xe_vma_max_pte_size(old))); 2531 2230 if (op->remap.skip_prev) { 2532 2231 xe_vma_set_pte_size(vma, xe_vma_max_pte_size(old)); 2533 2232 op->remap.range -= ··· 2544 2241 } 2545 2242 2546 2243 if (op->base.remap.next) { 2547 - flags |= op->base.remap.unmap->va->flags & 2548 - XE_VMA_READ_ONLY ? 2549 - VMA_CREATE_FLAG_READ_ONLY : 0; 2550 - flags |= op->base.remap.unmap->va->flags & 2551 - DRM_GPUVA_SPARSE ? 2552 - VMA_CREATE_FLAG_IS_NULL : 0; 2553 - flags |= op->base.remap.unmap->va->flags & 2554 - XE_VMA_DUMPABLE ? 2555 - VMA_CREATE_FLAG_DUMPABLE : 0; 2556 - 2557 2244 vma = new_vma(vm, op->base.remap.next, 2558 2245 old->pat_index, flags); 2559 2246 if (IS_ERR(vma)) ··· 2555 2262 * Userptr creates a new SG mapping so 2556 2263 * we must also rebind. 2557 2264 */ 2558 - op->remap.skip_next = !xe_vma_is_userptr(old) && 2265 + op->remap.skip_next = skip || 2266 + (!xe_vma_is_userptr(old) && 2559 2267 IS_ALIGNED(xe_vma_start(vma), 2560 - xe_vma_max_pte_size(old)); 2268 + xe_vma_max_pte_size(old))); 2561 2269 if (op->remap.skip_next) { 2562 2270 xe_vma_set_pte_size(vma, xe_vma_max_pte_size(old)); 2563 2271 op->remap.range -= ··· 2571 2277 xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask); 2572 2278 } 2573 2279 } 2574 - xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask); 2280 + if (!skip) 2281 + xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask); 2575 2282 break; 2576 2283 } 2577 2284 case DRM_GPUVA_OP_UNMAP: 2285 + vma = gpuva_to_vma(op->base.unmap.va); 2286 + 2287 + if (xe_vma_is_cpu_addr_mirror(vma) && 2288 + xe_svm_has_mapping(vm, xe_vma_start(vma), 2289 + xe_vma_end(vma))) 2290 + return -EBUSY; 2291 + 2292 + if (!xe_vma_is_cpu_addr_mirror(vma)) 2293 + xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask); 2294 + break; 2578 2295 case DRM_GPUVA_OP_PREFETCH: 2579 - /* FIXME: Need to skip some prefetch ops */ 2580 - xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask); 2296 + vma = gpuva_to_vma(op->base.prefetch.va); 2297 + 2298 + if (xe_vma_is_userptr(vma)) { 2299 + err = xe_vma_userptr_pin_pages(to_userptr_vma(vma)); 2300 + if (err) 2301 + return err; 2302 + } 2303 + 2304 + if (!xe_vma_is_cpu_addr_mirror(vma)) 2305 + xe_vma_ops_incr_pt_update_ops(vops, op->tile_mask); 2581 2306 break; 2582 2307 default: 2583 2308 drm_warn(&vm->xe->drm, "NOT POSSIBLE"); ··· 2822 2509 case DRM_GPUVA_OP_PREFETCH: 2823 2510 trace_xe_vma_bind(gpuva_to_vma(op->base.prefetch.va)); 2824 2511 break; 2512 + case DRM_GPUVA_OP_DRIVER: 2513 + break; 2825 2514 default: 2826 2515 XE_WARN_ON("NOT POSSIBLE"); 2827 2516 } ··· 3001 2686 } 3002 2687 if (ufence) 3003 2688 xe_sync_ufence_put(ufence); 3004 - for (i = 0; i < vops->num_syncs; i++) 3005 - xe_sync_entry_signal(vops->syncs + i, fence); 3006 - xe_exec_queue_last_fence_set(wait_exec_queue, vm, fence); 2689 + if (fence) { 2690 + for (i = 0; i < vops->num_syncs; i++) 2691 + xe_sync_entry_signal(vops->syncs + i, fence); 2692 + xe_exec_queue_last_fence_set(wait_exec_queue, vm, fence); 2693 + } 3007 2694 } 3008 2695 3009 2696 static struct dma_fence *vm_bind_ioctl_ops_execute(struct xe_vm *vm, ··· 3028 2711 } 3029 2712 3030 2713 fence = ops_execute(vm, vops); 3031 - if (IS_ERR(fence)) 2714 + if (IS_ERR(fence)) { 2715 + if (PTR_ERR(fence) == -ENODATA) 2716 + vm_bind_ioctl_ops_fini(vm, vops, NULL); 3032 2717 goto unlock; 2718 + } 3033 2719 3034 2720 vm_bind_ioctl_ops_fini(vm, vops, fence); 3035 2721 } ··· 3048 2728 DRM_XE_VM_BIND_FLAG_IMMEDIATE | \ 3049 2729 DRM_XE_VM_BIND_FLAG_NULL | \ 3050 2730 DRM_XE_VM_BIND_FLAG_DUMPABLE | \ 3051 - DRM_XE_VM_BIND_FLAG_CHECK_PXP) 2731 + DRM_XE_VM_BIND_FLAG_CHECK_PXP | \ 2732 + DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR) 3052 2733 3053 2734 #ifdef TEST_VM_OPS_ERROR 3054 2735 #define SUPPORTED_FLAGS (SUPPORTED_FLAGS_STUB | FORCE_OP_ERROR) ··· 3060 2739 #define XE_64K_PAGE_MASK 0xffffull 3061 2740 #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP) 3062 2741 3063 - static int vm_bind_ioctl_check_args(struct xe_device *xe, 2742 + static int vm_bind_ioctl_check_args(struct xe_device *xe, struct xe_vm *vm, 3064 2743 struct drm_xe_vm_bind *args, 3065 2744 struct drm_xe_vm_bind_op **bind_ops) 3066 2745 { ··· 3105 2784 u64 obj_offset = (*bind_ops)[i].obj_offset; 3106 2785 u32 prefetch_region = (*bind_ops)[i].prefetch_mem_region_instance; 3107 2786 bool is_null = flags & DRM_XE_VM_BIND_FLAG_NULL; 2787 + bool is_cpu_addr_mirror = flags & 2788 + DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR; 3108 2789 u16 pat_index = (*bind_ops)[i].pat_index; 3109 2790 u16 coh_mode; 2791 + 2792 + if (XE_IOCTL_DBG(xe, is_cpu_addr_mirror && 2793 + (!xe_vm_in_fault_mode(vm) || 2794 + !IS_ENABLED(CONFIG_DRM_GPUSVM)))) { 2795 + err = -EINVAL; 2796 + goto free_bind_ops; 2797 + } 3110 2798 3111 2799 if (XE_IOCTL_DBG(xe, pat_index >= xe->pat.n_entries)) { 3112 2800 err = -EINVAL; ··· 3137 2807 3138 2808 if (XE_IOCTL_DBG(xe, op > DRM_XE_VM_BIND_OP_PREFETCH) || 3139 2809 XE_IOCTL_DBG(xe, flags & ~SUPPORTED_FLAGS) || 3140 - XE_IOCTL_DBG(xe, obj && is_null) || 3141 - XE_IOCTL_DBG(xe, obj_offset && is_null) || 2810 + XE_IOCTL_DBG(xe, obj && (is_null || is_cpu_addr_mirror)) || 2811 + XE_IOCTL_DBG(xe, obj_offset && (is_null || 2812 + is_cpu_addr_mirror)) || 3142 2813 XE_IOCTL_DBG(xe, op != DRM_XE_VM_BIND_OP_MAP && 3143 - is_null) || 2814 + (is_null || is_cpu_addr_mirror)) || 3144 2815 XE_IOCTL_DBG(xe, !obj && 3145 2816 op == DRM_XE_VM_BIND_OP_MAP && 3146 - !is_null) || 2817 + !is_null && !is_cpu_addr_mirror) || 3147 2818 XE_IOCTL_DBG(xe, !obj && 3148 2819 op == DRM_XE_VM_BIND_OP_UNMAP_ALL) || 3149 2820 XE_IOCTL_DBG(xe, addr && ··· 3293 2962 int err; 3294 2963 int i; 3295 2964 3296 - err = vm_bind_ioctl_check_args(xe, args, &bind_ops); 2965 + vm = xe_vm_lookup(xef, args->vm_id); 2966 + if (XE_IOCTL_DBG(xe, !vm)) 2967 + return -EINVAL; 2968 + 2969 + err = vm_bind_ioctl_check_args(xe, vm, args, &bind_ops); 3297 2970 if (err) 3298 - return err; 2971 + goto put_vm; 3299 2972 3300 2973 if (args->exec_queue_id) { 3301 2974 q = xe_exec_queue_lookup(xef, args->exec_queue_id); 3302 2975 if (XE_IOCTL_DBG(xe, !q)) { 3303 2976 err = -ENOENT; 3304 - goto free_objs; 2977 + goto put_vm; 3305 2978 } 3306 2979 3307 2980 if (XE_IOCTL_DBG(xe, !(q->flags & EXEC_QUEUE_FLAG_VM))) { ··· 3314 2979 } 3315 2980 } 3316 2981 3317 - vm = xe_vm_lookup(xef, args->vm_id); 3318 - if (XE_IOCTL_DBG(xe, !vm)) { 3319 - err = -EINVAL; 3320 - goto put_exec_queue; 3321 - } 2982 + /* Ensure all UNMAPs visible */ 2983 + if (xe_vm_in_fault_mode(vm)) 2984 + flush_work(&vm->svm.garbage_collector.work); 3322 2985 3323 2986 err = down_write_killable(&vm->lock); 3324 2987 if (err) 3325 - goto put_vm; 2988 + goto put_exec_queue; 3326 2989 3327 2990 if (XE_IOCTL_DBG(xe, xe_vm_is_closed_or_banned(vm))) { 3328 2991 err = -ENOENT; ··· 3484 3151 xe_bo_put(bos[i]); 3485 3152 release_vm_lock: 3486 3153 up_write(&vm->lock); 3487 - put_vm: 3488 - xe_vm_put(vm); 3489 3154 put_exec_queue: 3490 3155 if (q) 3491 3156 xe_exec_queue_put(q); 3492 - free_objs: 3157 + put_vm: 3158 + xe_vm_put(vm); 3493 3159 kvfree(bos); 3494 3160 kvfree(ops); 3495 3161 if (args->num_binds > 1) ··· 3620 3288 int ret = 0; 3621 3289 3622 3290 xe_assert(xe, !xe_vma_is_null(vma)); 3291 + xe_assert(xe, !xe_vma_is_cpu_addr_mirror(vma)); 3623 3292 trace_xe_vma_invalidate(vma); 3624 3293 3625 3294 vm_dbg(&xe_vma_vm(vma)->xe->drm,

+23 -2

drivers/gpu/drm/xe/xe_vm.h

··· 23 23 struct xe_exec_queue; 24 24 struct xe_file; 25 25 struct xe_sync_entry; 26 + struct xe_svm_range; 26 27 struct drm_exec; 27 28 28 29 struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags); ··· 153 152 return vma->gpuva.flags & DRM_GPUVA_SPARSE; 154 153 } 155 154 155 + static inline bool xe_vma_is_cpu_addr_mirror(struct xe_vma *vma) 156 + { 157 + return vma->gpuva.flags & XE_VMA_SYSTEM_ALLOCATOR; 158 + } 159 + 156 160 static inline bool xe_vma_has_no_bo(struct xe_vma *vma) 157 161 { 158 162 return !xe_vma_bo(vma); ··· 165 159 166 160 static inline bool xe_vma_is_userptr(struct xe_vma *vma) 167 161 { 168 - return xe_vma_has_no_bo(vma) && !xe_vma_is_null(vma); 162 + return xe_vma_has_no_bo(vma) && !xe_vma_is_null(vma) && 163 + !xe_vma_is_cpu_addr_mirror(vma); 169 164 } 170 165 171 166 /** ··· 219 212 int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker); 220 213 struct dma_fence *xe_vma_rebind(struct xe_vm *vm, struct xe_vma *vma, 221 214 u8 tile_mask); 215 + struct dma_fence *xe_vm_range_rebind(struct xe_vm *vm, 216 + struct xe_vma *vma, 217 + struct xe_svm_range *range, 218 + u8 tile_mask); 219 + struct dma_fence *xe_vm_range_unbind(struct xe_vm *vm, 220 + struct xe_svm_range *range); 222 221 223 222 int xe_vm_invalidate_vma(struct xe_vma *vma); 224 223 ··· 295 282 const char *format, ...) 296 283 { /* noop */ } 297 284 #endif 298 - #endif 299 285 300 286 struct xe_vm_snapshot *xe_vm_snapshot_capture(struct xe_vm *vm); 301 287 void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap); 302 288 void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct drm_printer *p); 303 289 void xe_vm_snapshot_free(struct xe_vm_snapshot *snap); 290 + 291 + #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT) 292 + void xe_vma_userptr_force_invalidate(struct xe_userptr_vma *uvma); 293 + #else 294 + static inline void xe_vma_userptr_force_invalidate(struct xe_userptr_vma *uvma) 295 + { 296 + } 297 + #endif 298 + #endif

+63 -2

drivers/gpu/drm/xe/xe_vm_types.h

··· 6 6 #ifndef _XE_VM_TYPES_H_ 7 7 #define _XE_VM_TYPES_H_ 8 8 9 + #include <drm/drm_gpusvm.h> 9 10 #include <drm/drm_gpuvm.h> 10 11 11 12 #include <linux/dma-resv.h> ··· 19 18 #include "xe_range_fence.h" 20 19 21 20 struct xe_bo; 21 + struct xe_svm_range; 22 22 struct xe_sync_entry; 23 23 struct xe_user_fence; 24 24 struct xe_vm; ··· 44 42 #define XE_VMA_PTE_64K (DRM_GPUVA_USERBITS << 6) 45 43 #define XE_VMA_PTE_COMPACT (DRM_GPUVA_USERBITS << 7) 46 44 #define XE_VMA_DUMPABLE (DRM_GPUVA_USERBITS << 8) 45 + #define XE_VMA_SYSTEM_ALLOCATOR (DRM_GPUVA_USERBITS << 9) 47 46 48 47 /** struct xe_userptr - User pointer */ 49 48 struct xe_userptr { ··· 62 59 struct sg_table *sg; 63 60 /** @notifier_seq: notifier sequence number */ 64 61 unsigned long notifier_seq; 62 + /** @unmap_mutex: Mutex protecting dma-unmapping */ 63 + struct mutex unmap_mutex; 65 64 /** 66 65 * @initial_bind: user pointer has been bound at least once. 67 66 * write: vm->userptr.notifier_lock in read mode and vm->resv held. 68 67 * read: vm->userptr.notifier_lock in write mode or vm->resv held. 69 68 */ 70 69 bool initial_bind; 70 + /** @mapped: Whether the @sgt sg-table is dma-mapped. Protected by @unmap_mutex. */ 71 + bool mapped; 71 72 #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT) 72 73 u32 divisor; 73 74 #endif ··· 145 138 struct xe_vm { 146 139 /** @gpuvm: base GPUVM used to track VMAs */ 147 140 struct drm_gpuvm gpuvm; 141 + 142 + /** @svm: Shared virtual memory state */ 143 + struct { 144 + /** @svm.gpusvm: base GPUSVM used to track fault allocations */ 145 + struct drm_gpusvm gpusvm; 146 + /** 147 + * @svm.garbage_collector: Garbage collector which is used unmap 148 + * SVM range's GPU bindings and destroy the ranges. 149 + */ 150 + struct { 151 + /** @svm.garbage_collector.lock: Protect's range list */ 152 + spinlock_t lock; 153 + /** 154 + * @svm.garbage_collector.range_list: List of SVM ranges 155 + * in the garbage collector. 156 + */ 157 + struct list_head range_list; 158 + /** 159 + * @svm.garbage_collector.work: Worker which the 160 + * garbage collector runs on. 161 + */ 162 + struct work_struct work; 163 + } garbage_collector; 164 + } svm; 148 165 149 166 struct xe_device *xe; 150 167 ··· 259 228 * up for revalidation. Protected from access with the 260 229 * @invalidated_lock. Removing items from the list 261 230 * additionally requires @lock in write mode, and adding 262 - * items to the list requires the @userptr.notifer_lock in 263 - * write mode. 231 + * items to the list requires either the @userptr.notifer_lock in 232 + * write mode, OR @lock in write mode. 264 233 */ 265 234 struct list_head invalidated; 266 235 } userptr; ··· 326 295 bool read_only; 327 296 /** @is_null: is NULL binding */ 328 297 bool is_null; 298 + /** @is_cpu_addr_mirror: is CPU address mirror binding */ 299 + bool is_cpu_addr_mirror; 329 300 /** @dumpable: whether BO is dumped on GPU hang */ 330 301 bool dumpable; 331 302 /** @pat_index: The pat index to use for this operation. */ ··· 358 325 u32 region; 359 326 }; 360 327 328 + /** struct xe_vma_op_map_range - VMA map range operation */ 329 + struct xe_vma_op_map_range { 330 + /** @vma: VMA to map (system allocator VMA) */ 331 + struct xe_vma *vma; 332 + /** @range: SVM range to map */ 333 + struct xe_svm_range *range; 334 + }; 335 + 336 + /** struct xe_vma_op_unmap_range - VMA unmap range operation */ 337 + struct xe_vma_op_unmap_range { 338 + /** @range: SVM range to unmap */ 339 + struct xe_svm_range *range; 340 + }; 341 + 361 342 /** enum xe_vma_op_flags - flags for VMA operation */ 362 343 enum xe_vma_op_flags { 363 344 /** @XE_VMA_OP_COMMITTED: VMA operation committed */ ··· 382 335 XE_VMA_OP_NEXT_COMMITTED = BIT(2), 383 336 }; 384 337 338 + /** enum xe_vma_subop - VMA sub-operation */ 339 + enum xe_vma_subop { 340 + /** @XE_VMA_SUBOP_MAP_RANGE: Map range */ 341 + XE_VMA_SUBOP_MAP_RANGE, 342 + /** @XE_VMA_SUBOP_UNMAP_RANGE: Unmap range */ 343 + XE_VMA_SUBOP_UNMAP_RANGE, 344 + }; 345 + 385 346 /** struct xe_vma_op - VMA operation */ 386 347 struct xe_vma_op { 387 348 /** @base: GPUVA base operation */ ··· 398 343 struct list_head link; 399 344 /** @flags: operation flags */ 400 345 enum xe_vma_op_flags flags; 346 + /** @subop: user defined sub-operation */ 347 + enum xe_vma_subop subop; 401 348 /** @tile_mask: Tile mask for operation */ 402 349 u8 tile_mask; 403 350 ··· 410 353 struct xe_vma_op_remap remap; 411 354 /** @prefetch: VMA prefetch operation specific data */ 412 355 struct xe_vma_op_prefetch prefetch; 356 + /** @map_range: VMA map range operation specific data */ 357 + struct xe_vma_op_map_range map_range; 358 + /** @unmap_range: VMA unmap range operation specific data */ 359 + struct xe_vma_op_unmap_range unmap_range; 413 360 }; 414 361 }; 415 362

+19

drivers/gpu/drm/xe/xe_wa.c

··· 619 619 FUNC(xe_rtp_match_first_render_or_compute)), 620 620 XE_RTP_ACTIONS(SET(TDL_CHICKEN, QID_WAIT_FOR_THREAD_NOT_RUN_DISABLE)) 621 621 }, 622 + { XE_RTP_NAME("13012615864"), 623 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3001), 624 + FUNC(xe_rtp_match_first_render_or_compute)), 625 + XE_RTP_ACTIONS(SET(TDL_TSL_CHICKEN, RES_CHK_SPR_DIS)) 626 + }, 622 627 623 628 {} 624 629 }; 625 630 626 631 static const struct xe_rtp_entry_sr lrc_was[] = { 632 + { XE_RTP_NAME("16011163337"), 633 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)), 634 + /* read verification is ignored due to 1608008084. */ 635 + XE_RTP_ACTIONS(FIELD_SET_NO_READ_MASK(FF_MODE2, 636 + FF_MODE2_GS_TIMER_MASK, 637 + FF_MODE2_GS_TIMER_224)) 638 + }, 639 + { XE_RTP_NAME("1604555607"), 640 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)), 641 + /* read verification is ignored due to 1608008084. */ 642 + XE_RTP_ACTIONS(FIELD_SET_NO_READ_MASK(FF_MODE2, 643 + FF_MODE2_TDS_TIMER_MASK, 644 + FF_MODE2_TDS_TIMER_128)) 645 + }, 627 646 { XE_RTP_NAME("1409342910, 14010698770, 14010443199, 1408979724, 1409178076, 1409207793, 1409217633, 1409252684, 1409347922, 1409142259"), 628 647 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210)), 629 648 XE_RTP_ACTIONS(SET(COMMON_SLICE_CHICKEN3,

+10

drivers/gpu/drm/xe/xe_wa_oob.rules

··· 5 5 22011391025 PLATFORM(DG2) 6 6 22012727170 SUBPLATFORM(DG2, G11) 7 7 22012727685 SUBPLATFORM(DG2, G11) 8 + 22016596838 PLATFORM(PVC) 8 9 18020744125 PLATFORM(PVC) 9 10 1509372804 PLATFORM(PVC), GRAPHICS_STEP(A0, C0) 10 11 1409600907 GRAPHICS_VERSION_RANGE(1200, 1250) ··· 44 43 no_media_l3 MEDIA_VERSION(3000) 45 44 14022866841 GRAPHICS_VERSION(3000), GRAPHICS_STEP(A0, B0) 46 45 MEDIA_VERSION(3000), MEDIA_STEP(A0, B0) 46 + 16021333562 GRAPHICS_VERSION_RANGE(1200, 1274) 47 + MEDIA_VERSION(1300) 48 + 14016712196 GRAPHICS_VERSION(1255) 49 + GRAPHICS_VERSION_RANGE(1270, 1274) 50 + 14015568240 GRAPHICS_VERSION_RANGE(1255, 1260) 51 + 18013179988 GRAPHICS_VERSION(1255) 52 + GRAPHICS_VERSION_RANGE(1270, 1274) 53 + 1508761755 GRAPHICS_VERSION(1255) 54 + GRAPHICS_VERSION(1260), GRAPHICS_STEP(A0, B0)

+509

include/drm/drm_gpusvm.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only OR MIT */ 2 + /* 3 + * Copyright © 2024 Intel Corporation 4 + */ 5 + 6 + #ifndef __DRM_GPUSVM_H__ 7 + #define __DRM_GPUSVM_H__ 8 + 9 + #include <linux/kref.h> 10 + #include <linux/interval_tree.h> 11 + #include <linux/mmu_notifier.h> 12 + 13 + struct dev_pagemap_ops; 14 + struct drm_device; 15 + struct drm_gpusvm; 16 + struct drm_gpusvm_notifier; 17 + struct drm_gpusvm_ops; 18 + struct drm_gpusvm_range; 19 + struct drm_gpusvm_devmem; 20 + struct drm_pagemap; 21 + struct drm_pagemap_device_addr; 22 + 23 + /** 24 + * struct drm_gpusvm_devmem_ops - Operations structure for GPU SVM device memory 25 + * 26 + * This structure defines the operations for GPU Shared Virtual Memory (SVM) 27 + * device memory. These operations are provided by the GPU driver to manage device memory 28 + * allocations and perform operations such as migration between device memory and system 29 + * RAM. 30 + */ 31 + struct drm_gpusvm_devmem_ops { 32 + /** 33 + * @devmem_release: Release device memory allocation (optional) 34 + * @devmem_allocation: device memory allocation 35 + * 36 + * Release device memory allocation and drop a reference to device 37 + * memory allocation. 38 + */ 39 + void (*devmem_release)(struct drm_gpusvm_devmem *devmem_allocation); 40 + 41 + /** 42 + * @populate_devmem_pfn: Populate device memory PFN (required for migration) 43 + * @devmem_allocation: device memory allocation 44 + * @npages: Number of pages to populate 45 + * @pfn: Array of page frame numbers to populate 46 + * 47 + * Populate device memory page frame numbers (PFN). 48 + * 49 + * Return: 0 on success, a negative error code on failure. 50 + */ 51 + int (*populate_devmem_pfn)(struct drm_gpusvm_devmem *devmem_allocation, 52 + unsigned long npages, unsigned long *pfn); 53 + 54 + /** 55 + * @copy_to_devmem: Copy to device memory (required for migration) 56 + * @pages: Pointer to array of device memory pages (destination) 57 + * @dma_addr: Pointer to array of DMA addresses (source) 58 + * @npages: Number of pages to copy 59 + * 60 + * Copy pages to device memory. 61 + * 62 + * Return: 0 on success, a negative error code on failure. 63 + */ 64 + int (*copy_to_devmem)(struct page **pages, 65 + dma_addr_t *dma_addr, 66 + unsigned long npages); 67 + 68 + /** 69 + * @copy_to_ram: Copy to system RAM (required for migration) 70 + * @pages: Pointer to array of device memory pages (source) 71 + * @dma_addr: Pointer to array of DMA addresses (destination) 72 + * @npages: Number of pages to copy 73 + * 74 + * Copy pages to system RAM. 75 + * 76 + * Return: 0 on success, a negative error code on failure. 77 + */ 78 + int (*copy_to_ram)(struct page **pages, 79 + dma_addr_t *dma_addr, 80 + unsigned long npages); 81 + }; 82 + 83 + /** 84 + * struct drm_gpusvm_devmem - Structure representing a GPU SVM device memory allocation 85 + * 86 + * @dev: Pointer to the device structure which device memory allocation belongs to 87 + * @mm: Pointer to the mm_struct for the address space 88 + * @detached: device memory allocations is detached from device pages 89 + * @ops: Pointer to the operations structure for GPU SVM device memory 90 + * @dpagemap: The struct drm_pagemap of the pages this allocation belongs to. 91 + * @size: Size of device memory allocation 92 + */ 93 + struct drm_gpusvm_devmem { 94 + struct device *dev; 95 + struct mm_struct *mm; 96 + struct completion detached; 97 + const struct drm_gpusvm_devmem_ops *ops; 98 + struct drm_pagemap *dpagemap; 99 + size_t size; 100 + }; 101 + 102 + /** 103 + * struct drm_gpusvm_ops - Operations structure for GPU SVM 104 + * 105 + * This structure defines the operations for GPU Shared Virtual Memory (SVM). 106 + * These operations are provided by the GPU driver to manage SVM ranges and 107 + * notifiers. 108 + */ 109 + struct drm_gpusvm_ops { 110 + /** 111 + * @notifier_alloc: Allocate a GPU SVM notifier (optional) 112 + * 113 + * Allocate a GPU SVM notifier. 114 + * 115 + * Return: Pointer to the allocated GPU SVM notifier on success, NULL on failure. 116 + */ 117 + struct drm_gpusvm_notifier *(*notifier_alloc)(void); 118 + 119 + /** 120 + * @notifier_free: Free a GPU SVM notifier (optional) 121 + * @notifier: Pointer to the GPU SVM notifier to be freed 122 + * 123 + * Free a GPU SVM notifier. 124 + */ 125 + void (*notifier_free)(struct drm_gpusvm_notifier *notifier); 126 + 127 + /** 128 + * @range_alloc: Allocate a GPU SVM range (optional) 129 + * @gpusvm: Pointer to the GPU SVM 130 + * 131 + * Allocate a GPU SVM range. 132 + * 133 + * Return: Pointer to the allocated GPU SVM range on success, NULL on failure. 134 + */ 135 + struct drm_gpusvm_range *(*range_alloc)(struct drm_gpusvm *gpusvm); 136 + 137 + /** 138 + * @range_free: Free a GPU SVM range (optional) 139 + * @range: Pointer to the GPU SVM range to be freed 140 + * 141 + * Free a GPU SVM range. 142 + */ 143 + void (*range_free)(struct drm_gpusvm_range *range); 144 + 145 + /** 146 + * @invalidate: Invalidate GPU SVM notifier (required) 147 + * @gpusvm: Pointer to the GPU SVM 148 + * @notifier: Pointer to the GPU SVM notifier 149 + * @mmu_range: Pointer to the mmu_notifier_range structure 150 + * 151 + * Invalidate the GPU page tables. It can safely walk the notifier range 152 + * RB tree/list in this function. Called while holding the notifier lock. 153 + */ 154 + void (*invalidate)(struct drm_gpusvm *gpusvm, 155 + struct drm_gpusvm_notifier *notifier, 156 + const struct mmu_notifier_range *mmu_range); 157 + }; 158 + 159 + /** 160 + * struct drm_gpusvm_notifier - Structure representing a GPU SVM notifier 161 + * 162 + * @gpusvm: Pointer to the GPU SVM structure 163 + * @notifier: MMU interval notifier 164 + * @itree: Interval tree node for the notifier (inserted in GPU SVM) 165 + * @entry: List entry to fast interval tree traversal 166 + * @root: Cached root node of the RB tree containing ranges 167 + * @range_list: List head containing of ranges in the same order they appear in 168 + * interval tree. This is useful to keep iterating ranges while 169 + * doing modifications to RB tree. 170 + * @flags: Flags for notifier 171 + * @flags.removed: Flag indicating whether the MMU interval notifier has been 172 + * removed 173 + * 174 + * This structure represents a GPU SVM notifier. 175 + */ 176 + struct drm_gpusvm_notifier { 177 + struct drm_gpusvm *gpusvm; 178 + struct mmu_interval_notifier notifier; 179 + struct interval_tree_node itree; 180 + struct list_head entry; 181 + struct rb_root_cached root; 182 + struct list_head range_list; 183 + struct { 184 + u32 removed : 1; 185 + } flags; 186 + }; 187 + 188 + /** 189 + * struct drm_gpusvm_range - Structure representing a GPU SVM range 190 + * 191 + * @gpusvm: Pointer to the GPU SVM structure 192 + * @notifier: Pointer to the GPU SVM notifier 193 + * @refcount: Reference count for the range 194 + * @itree: Interval tree node for the range (inserted in GPU SVM notifier) 195 + * @entry: List entry to fast interval tree traversal 196 + * @notifier_seq: Notifier sequence number of the range's pages 197 + * @dma_addr: Device address array 198 + * @dpagemap: The struct drm_pagemap of the device pages we're dma-mapping. 199 + * Note this is assuming only one drm_pagemap per range is allowed. 200 + * @flags: Flags for range 201 + * @flags.migrate_devmem: Flag indicating whether the range can be migrated to device memory 202 + * @flags.unmapped: Flag indicating if the range has been unmapped 203 + * @flags.partial_unmap: Flag indicating if the range has been partially unmapped 204 + * @flags.has_devmem_pages: Flag indicating if the range has devmem pages 205 + * @flags.has_dma_mapping: Flag indicating if the range has a DMA mapping 206 + * 207 + * This structure represents a GPU SVM range used for tracking memory ranges 208 + * mapped in a DRM device. 209 + */ 210 + struct drm_gpusvm_range { 211 + struct drm_gpusvm *gpusvm; 212 + struct drm_gpusvm_notifier *notifier; 213 + struct kref refcount; 214 + struct interval_tree_node itree; 215 + struct list_head entry; 216 + unsigned long notifier_seq; 217 + struct drm_pagemap_device_addr *dma_addr; 218 + struct drm_pagemap *dpagemap; 219 + struct { 220 + /* All flags below must be set upon creation */ 221 + u16 migrate_devmem : 1; 222 + /* All flags below must be set / cleared under notifier lock */ 223 + u16 unmapped : 1; 224 + u16 partial_unmap : 1; 225 + u16 has_devmem_pages : 1; 226 + u16 has_dma_mapping : 1; 227 + } flags; 228 + }; 229 + 230 + /** 231 + * struct drm_gpusvm - GPU SVM structure 232 + * 233 + * @name: Name of the GPU SVM 234 + * @drm: Pointer to the DRM device structure 235 + * @mm: Pointer to the mm_struct for the address space 236 + * @device_private_page_owner: Device private pages owner 237 + * @mm_start: Start address of GPU SVM 238 + * @mm_range: Range of the GPU SVM 239 + * @notifier_size: Size of individual notifiers 240 + * @ops: Pointer to the operations structure for GPU SVM 241 + * @chunk_sizes: Pointer to the array of chunk sizes used in range allocation. 242 + * Entries should be powers of 2 in descending order. 243 + * @num_chunks: Number of chunks 244 + * @notifier_lock: Read-write semaphore for protecting notifier operations 245 + * @root: Cached root node of the Red-Black tree containing GPU SVM notifiers 246 + * @notifier_list: list head containing of notifiers in the same order they 247 + * appear in interval tree. This is useful to keep iterating 248 + * notifiers while doing modifications to RB tree. 249 + * 250 + * This structure represents a GPU SVM (Shared Virtual Memory) used for tracking 251 + * memory ranges mapped in a DRM (Direct Rendering Manager) device. 252 + * 253 + * No reference counting is provided, as this is expected to be embedded in the 254 + * driver VM structure along with the struct drm_gpuvm, which handles reference 255 + * counting. 256 + */ 257 + struct drm_gpusvm { 258 + const char *name; 259 + struct drm_device *drm; 260 + struct mm_struct *mm; 261 + void *device_private_page_owner; 262 + unsigned long mm_start; 263 + unsigned long mm_range; 264 + unsigned long notifier_size; 265 + const struct drm_gpusvm_ops *ops; 266 + const unsigned long *chunk_sizes; 267 + int num_chunks; 268 + struct rw_semaphore notifier_lock; 269 + struct rb_root_cached root; 270 + struct list_head notifier_list; 271 + #ifdef CONFIG_LOCKDEP 272 + /** 273 + * @lock_dep_map: Annotates drm_gpusvm_range_find_or_insert and 274 + * drm_gpusvm_range_remove with a driver provided lock. 275 + */ 276 + struct lockdep_map *lock_dep_map; 277 + #endif 278 + }; 279 + 280 + /** 281 + * struct drm_gpusvm_ctx - DRM GPU SVM context 282 + * 283 + * @check_pages_threshold: Check CPU pages for present if chunk is less than or 284 + * equal to threshold. If not present, reduce chunk 285 + * size. 286 + * @in_notifier: entering from a MMU notifier 287 + * @read_only: operating on read-only memory 288 + * @devmem_possible: possible to use device memory 289 + * 290 + * Context that is DRM GPUSVM is operating in (i.e. user arguments). 291 + */ 292 + struct drm_gpusvm_ctx { 293 + unsigned long check_pages_threshold; 294 + unsigned int in_notifier :1; 295 + unsigned int read_only :1; 296 + unsigned int devmem_possible :1; 297 + }; 298 + 299 + int drm_gpusvm_init(struct drm_gpusvm *gpusvm, 300 + const char *name, struct drm_device *drm, 301 + struct mm_struct *mm, void *device_private_page_owner, 302 + unsigned long mm_start, unsigned long mm_range, 303 + unsigned long notifier_size, 304 + const struct drm_gpusvm_ops *ops, 305 + const unsigned long *chunk_sizes, int num_chunks); 306 + 307 + void drm_gpusvm_fini(struct drm_gpusvm *gpusvm); 308 + 309 + void drm_gpusvm_free(struct drm_gpusvm *gpusvm); 310 + 311 + struct drm_gpusvm_range * 312 + drm_gpusvm_range_find_or_insert(struct drm_gpusvm *gpusvm, 313 + unsigned long fault_addr, 314 + unsigned long gpuva_start, 315 + unsigned long gpuva_end, 316 + const struct drm_gpusvm_ctx *ctx); 317 + 318 + void drm_gpusvm_range_remove(struct drm_gpusvm *gpusvm, 319 + struct drm_gpusvm_range *range); 320 + 321 + int drm_gpusvm_range_evict(struct drm_gpusvm *gpusvm, 322 + struct drm_gpusvm_range *range); 323 + 324 + struct drm_gpusvm_range * 325 + drm_gpusvm_range_get(struct drm_gpusvm_range *range); 326 + 327 + void drm_gpusvm_range_put(struct drm_gpusvm_range *range); 328 + 329 + bool drm_gpusvm_range_pages_valid(struct drm_gpusvm *gpusvm, 330 + struct drm_gpusvm_range *range); 331 + 332 + int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm, 333 + struct drm_gpusvm_range *range, 334 + const struct drm_gpusvm_ctx *ctx); 335 + 336 + void drm_gpusvm_range_unmap_pages(struct drm_gpusvm *gpusvm, 337 + struct drm_gpusvm_range *range, 338 + const struct drm_gpusvm_ctx *ctx); 339 + 340 + int drm_gpusvm_migrate_to_devmem(struct drm_gpusvm *gpusvm, 341 + struct drm_gpusvm_range *range, 342 + struct drm_gpusvm_devmem *devmem_allocation, 343 + const struct drm_gpusvm_ctx *ctx); 344 + 345 + int drm_gpusvm_evict_to_ram(struct drm_gpusvm_devmem *devmem_allocation); 346 + 347 + const struct dev_pagemap_ops *drm_gpusvm_pagemap_ops_get(void); 348 + 349 + bool drm_gpusvm_has_mapping(struct drm_gpusvm *gpusvm, unsigned long start, 350 + unsigned long end); 351 + 352 + struct drm_gpusvm_range * 353 + drm_gpusvm_range_find(struct drm_gpusvm_notifier *notifier, unsigned long start, 354 + unsigned long end); 355 + 356 + void drm_gpusvm_range_set_unmapped(struct drm_gpusvm_range *range, 357 + const struct mmu_notifier_range *mmu_range); 358 + 359 + void drm_gpusvm_devmem_init(struct drm_gpusvm_devmem *devmem_allocation, 360 + struct device *dev, struct mm_struct *mm, 361 + const struct drm_gpusvm_devmem_ops *ops, 362 + struct drm_pagemap *dpagemap, size_t size); 363 + 364 + #ifdef CONFIG_LOCKDEP 365 + /** 366 + * drm_gpusvm_driver_set_lock() - Set the lock protecting accesses to GPU SVM 367 + * @gpusvm: Pointer to the GPU SVM structure. 368 + * @lock: the lock used to protect the gpuva list. The locking primitive 369 + * must contain a dep_map field. 370 + * 371 + * Call this to annotate drm_gpusvm_range_find_or_insert and 372 + * drm_gpusvm_range_remove. 373 + */ 374 + #define drm_gpusvm_driver_set_lock(gpusvm, lock) \ 375 + do { \ 376 + if (!WARN((gpusvm)->lock_dep_map, \ 377 + "GPUSVM range lock should be set only once."))\ 378 + (gpusvm)->lock_dep_map = &(lock)->dep_map; \ 379 + } while (0) 380 + #else 381 + #define drm_gpusvm_driver_set_lock(gpusvm, lock) do {} while (0) 382 + #endif 383 + 384 + /** 385 + * drm_gpusvm_notifier_lock() - Lock GPU SVM notifier 386 + * @gpusvm__: Pointer to the GPU SVM structure. 387 + * 388 + * Abstract client usage GPU SVM notifier lock, take lock 389 + */ 390 + #define drm_gpusvm_notifier_lock(gpusvm__) \ 391 + down_read(&(gpusvm__)->notifier_lock) 392 + 393 + /** 394 + * drm_gpusvm_notifier_unlock() - Unlock GPU SVM notifier 395 + * @gpusvm__: Pointer to the GPU SVM structure. 396 + * 397 + * Abstract client usage GPU SVM notifier lock, drop lock 398 + */ 399 + #define drm_gpusvm_notifier_unlock(gpusvm__) \ 400 + up_read(&(gpusvm__)->notifier_lock) 401 + 402 + /** 403 + * drm_gpusvm_range_start() - GPU SVM range start address 404 + * @range: Pointer to the GPU SVM range 405 + * 406 + * Return: GPU SVM range start address 407 + */ 408 + static inline unsigned long 409 + drm_gpusvm_range_start(struct drm_gpusvm_range *range) 410 + { 411 + return range->itree.start; 412 + } 413 + 414 + /** 415 + * drm_gpusvm_range_end() - GPU SVM range end address 416 + * @range: Pointer to the GPU SVM range 417 + * 418 + * Return: GPU SVM range end address 419 + */ 420 + static inline unsigned long 421 + drm_gpusvm_range_end(struct drm_gpusvm_range *range) 422 + { 423 + return range->itree.last + 1; 424 + } 425 + 426 + /** 427 + * drm_gpusvm_range_size() - GPU SVM range size 428 + * @range: Pointer to the GPU SVM range 429 + * 430 + * Return: GPU SVM range size 431 + */ 432 + static inline unsigned long 433 + drm_gpusvm_range_size(struct drm_gpusvm_range *range) 434 + { 435 + return drm_gpusvm_range_end(range) - drm_gpusvm_range_start(range); 436 + } 437 + 438 + /** 439 + * drm_gpusvm_notifier_start() - GPU SVM notifier start address 440 + * @notifier: Pointer to the GPU SVM notifier 441 + * 442 + * Return: GPU SVM notifier start address 443 + */ 444 + static inline unsigned long 445 + drm_gpusvm_notifier_start(struct drm_gpusvm_notifier *notifier) 446 + { 447 + return notifier->itree.start; 448 + } 449 + 450 + /** 451 + * drm_gpusvm_notifier_end() - GPU SVM notifier end address 452 + * @notifier: Pointer to the GPU SVM notifier 453 + * 454 + * Return: GPU SVM notifier end address 455 + */ 456 + static inline unsigned long 457 + drm_gpusvm_notifier_end(struct drm_gpusvm_notifier *notifier) 458 + { 459 + return notifier->itree.last + 1; 460 + } 461 + 462 + /** 463 + * drm_gpusvm_notifier_size() - GPU SVM notifier size 464 + * @notifier: Pointer to the GPU SVM notifier 465 + * 466 + * Return: GPU SVM notifier size 467 + */ 468 + static inline unsigned long 469 + drm_gpusvm_notifier_size(struct drm_gpusvm_notifier *notifier) 470 + { 471 + return drm_gpusvm_notifier_end(notifier) - 472 + drm_gpusvm_notifier_start(notifier); 473 + } 474 + 475 + /** 476 + * __drm_gpusvm_range_next() - Get the next GPU SVM range in the list 477 + * @range: a pointer to the current GPU SVM range 478 + * 479 + * Return: A pointer to the next drm_gpusvm_range if available, or NULL if the 480 + * current range is the last one or if the input range is NULL. 481 + */ 482 + static inline struct drm_gpusvm_range * 483 + __drm_gpusvm_range_next(struct drm_gpusvm_range *range) 484 + { 485 + if (range && !list_is_last(&range->entry, 486 + &range->notifier->range_list)) 487 + return list_next_entry(range, entry); 488 + 489 + return NULL; 490 + } 491 + 492 + /** 493 + * drm_gpusvm_for_each_range() - Iterate over GPU SVM ranges in a notifier 494 + * @range__: Iterator variable for the ranges. If set, it indicates the start of 495 + * the iterator. If NULL, call drm_gpusvm_range_find() to get the range. 496 + * @notifier__: Pointer to the GPU SVM notifier 497 + * @start__: Start address of the range 498 + * @end__: End address of the range 499 + * 500 + * This macro is used to iterate over GPU SVM ranges in a notifier. It is safe 501 + * to use while holding the driver SVM lock or the notifier lock. 502 + */ 503 + #define drm_gpusvm_for_each_range(range__, notifier__, start__, end__) \ 504 + for ((range__) = (range__) ?: \ 505 + drm_gpusvm_range_find((notifier__), (start__), (end__)); \ 506 + (range__) && (drm_gpusvm_range_start(range__) < (end__)); \ 507 + (range__) = __drm_gpusvm_range_next(range__)) 508 + 509 + #endif /* __DRM_GPUSVM_H__ */

+5

include/drm/drm_gpuvm.h

··· 812 812 * @DRM_GPUVA_OP_PREFETCH: the prefetch op type 813 813 */ 814 814 DRM_GPUVA_OP_PREFETCH, 815 + 816 + /** 817 + * @DRM_GPUVA_OP_DRIVER: the driver defined op type 818 + */ 819 + DRM_GPUVA_OP_DRIVER, 815 820 }; 816 821 817 822 /**

+107

include/drm/drm_pagemap.h

··· 1 + /* SPDX-License-Identifier: MIT */ 2 + #ifndef _DRM_PAGEMAP_H_ 3 + #define _DRM_PAGEMAP_H_ 4 + 5 + #include <linux/dma-direction.h> 6 + #include <linux/hmm.h> 7 + #include <linux/types.h> 8 + 9 + struct drm_pagemap; 10 + struct device; 11 + 12 + /** 13 + * enum drm_interconnect_protocol - Used to identify an interconnect protocol. 14 + * 15 + * @DRM_INTERCONNECT_SYSTEM: DMA map is system pages 16 + * @DRM_INTERCONNECT_DRIVER: DMA map is driver defined 17 + */ 18 + enum drm_interconnect_protocol { 19 + DRM_INTERCONNECT_SYSTEM, 20 + DRM_INTERCONNECT_DRIVER, 21 + /* A driver can add private values beyond DRM_INTERCONNECT_DRIVER */ 22 + }; 23 + 24 + /** 25 + * struct drm_pagemap_device_addr - Device address representation. 26 + * @addr: The dma address or driver-defined address for driver private interconnects. 27 + * @proto: The interconnect protocol. 28 + * @order: The page order of the device mapping. (Size is PAGE_SIZE << order). 29 + * @dir: The DMA direction. 30 + * 31 + * Note: There is room for improvement here. We should be able to pack into 32 + * 64 bits. 33 + */ 34 + struct drm_pagemap_device_addr { 35 + dma_addr_t addr; 36 + u64 proto : 54; 37 + u64 order : 8; 38 + u64 dir : 2; 39 + }; 40 + 41 + /** 42 + * drm_pagemap_device_addr_encode() - Encode a dma address with metadata 43 + * @addr: The dma address or driver-defined address for driver private interconnects. 44 + * @proto: The interconnect protocol. 45 + * @order: The page order of the dma mapping. (Size is PAGE_SIZE << order). 46 + * @dir: The DMA direction. 47 + * 48 + * Return: A struct drm_pagemap_device_addr encoding the above information. 49 + */ 50 + static inline struct drm_pagemap_device_addr 51 + drm_pagemap_device_addr_encode(dma_addr_t addr, 52 + enum drm_interconnect_protocol proto, 53 + unsigned int order, 54 + enum dma_data_direction dir) 55 + { 56 + return (struct drm_pagemap_device_addr) { 57 + .addr = addr, 58 + .proto = proto, 59 + .order = order, 60 + .dir = dir, 61 + }; 62 + } 63 + 64 + /** 65 + * struct drm_pagemap_ops: Ops for a drm-pagemap. 66 + */ 67 + struct drm_pagemap_ops { 68 + /** 69 + * @device_map: Map for device access or provide a virtual address suitable for 70 + * 71 + * @dpagemap: The struct drm_pagemap for the page. 72 + * @dev: The device mapper. 73 + * @page: The page to map. 74 + * @order: The page order of the device mapping. (Size is PAGE_SIZE << order). 75 + * @dir: The transfer direction. 76 + */ 77 + struct drm_pagemap_device_addr (*device_map)(struct drm_pagemap *dpagemap, 78 + struct device *dev, 79 + struct page *page, 80 + unsigned int order, 81 + enum dma_data_direction dir); 82 + 83 + /** 84 + * @device_unmap: Unmap a device address previously obtained using @device_map. 85 + * 86 + * @dpagemap: The struct drm_pagemap for the mapping. 87 + * @dev: The device unmapper. 88 + * @addr: The device address obtained when mapping. 89 + */ 90 + void (*device_unmap)(struct drm_pagemap *dpagemap, 91 + struct device *dev, 92 + struct drm_pagemap_device_addr addr); 93 + 94 + }; 95 + 96 + /** 97 + * struct drm_pagemap: Additional information for a struct dev_pagemap 98 + * used for device p2p handshaking. 99 + * @ops: The struct drm_pagemap_ops. 100 + * @dev: The struct drevice owning the device-private memory. 101 + */ 102 + struct drm_pagemap { 103 + const struct drm_pagemap_ops *ops; 104 + struct device *dev; 105 + }; 106 + 107 + #endif

+1

include/linux/migrate.h

··· 227 227 void migrate_vma_finalize(struct migrate_vma *migrate); 228 228 int migrate_device_range(unsigned long *src_pfns, unsigned long start, 229 229 unsigned long npages); 230 + int migrate_device_pfns(unsigned long *src_pfns, unsigned long npages); 230 231 void migrate_device_pages(unsigned long *src_pfns, unsigned long *dst_pfns, 231 232 unsigned long npages); 232 233 void migrate_device_finalize(unsigned long *src_pfns,

+115 -2

include/uapi/drm/xe_drm.h

··· 393 393 * 394 394 * - %DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM - Flag is set if the device 395 395 * has usable VRAM 396 + * - %DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY - Flag is set if the device 397 + * has low latency hint support 398 + * - %DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR - Flag is set if the 399 + * device has CPU address mirroring support 396 400 * - %DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT - Minimal memory alignment 397 401 * required by this device, typically SZ_4K or SZ_64K 398 402 * - %DRM_XE_QUERY_CONFIG_VA_BITS - Maximum bits of a virtual address ··· 413 409 #define DRM_XE_QUERY_CONFIG_REV_AND_DEVICE_ID 0 414 410 #define DRM_XE_QUERY_CONFIG_FLAGS 1 415 411 #define DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM (1 << 0) 412 + #define DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY (1 << 1) 413 + #define DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR (1 << 2) 416 414 #define DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT 2 417 415 #define DRM_XE_QUERY_CONFIG_VA_BITS 3 418 416 #define DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY 4 ··· 741 735 #define DRM_XE_DEVICE_QUERY_UC_FW_VERSION 7 742 736 #define DRM_XE_DEVICE_QUERY_OA_UNITS 8 743 737 #define DRM_XE_DEVICE_QUERY_PXP_STATUS 9 738 + #define DRM_XE_DEVICE_QUERY_EU_STALL 10 744 739 /** @query: The type of data to query */ 745 740 __u32 query; 746 741 ··· 993 986 * - %DRM_XE_VM_BIND_FLAG_CHECK_PXP - If the object is encrypted via PXP, 994 987 * reject the binding if the encryption key is no longer valid. This 995 988 * flag has no effect on BOs that are not marked as using PXP. 989 + * - %DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR - When the CPU address mirror flag is 990 + * set, no mappings are created rather the range is reserved for CPU address 991 + * mirroring which will be populated on GPU page faults or prefetches. Only 992 + * valid on VMs with DRM_XE_VM_CREATE_FLAG_FAULT_MODE set. The CPU address 993 + * mirror flag are only valid for DRM_XE_VM_BIND_OP_MAP operations, the BO 994 + * handle MBZ, and the BO offset MBZ. 996 995 */ 997 996 struct drm_xe_vm_bind_op { 998 997 /** @extensions: Pointer to the first extension struct, if any */ ··· 1051 1038 * on the @pat_index. For such mappings there is no actual memory being 1052 1039 * mapped (the address in the PTE is invalid), so the various PAT memory 1053 1040 * attributes likely do not apply. Simply leaving as zero is one 1054 - * option (still a valid pat_index). 1041 + * option (still a valid pat_index). Same applies to 1042 + * DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR bindings as for such mapping 1043 + * there is no actual memory being mapped. 1055 1044 */ 1056 1045 __u16 pat_index; 1057 1046 ··· 1069 1054 1070 1055 /** @userptr: user pointer to bind on */ 1071 1056 __u64 userptr; 1057 + 1058 + /** 1059 + * @cpu_addr_mirror_offset: Offset from GPU @addr to create 1060 + * CPU address mirror mappings. MBZ with current level of 1061 + * support (e.g. 1 to 1 mapping between GPU and CPU mappings 1062 + * only supported). 1063 + */ 1064 + __s64 cpu_addr_mirror_offset; 1072 1065 }; 1073 1066 1074 1067 /** ··· 1100 1077 #define DRM_XE_VM_BIND_FLAG_NULL (1 << 2) 1101 1078 #define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3) 1102 1079 #define DRM_XE_VM_BIND_FLAG_CHECK_PXP (1 << 4) 1080 + #define DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (1 << 5) 1103 1081 /** @flags: Bind flags */ 1104 1082 __u32 flags; 1105 1083 ··· 1228 1204 * }; 1229 1205 * ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, &exec_queue_create); 1230 1206 * 1207 + * Allow users to provide a hint to kernel for cases demanding low latency 1208 + * profile. Please note it will have impact on power consumption. User can 1209 + * indicate low latency hint with flag while creating exec queue as 1210 + * mentioned below, 1211 + * 1212 + * struct drm_xe_exec_queue_create exec_queue_create = { 1213 + * .flags = DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT, 1214 + * .extensions = 0, 1215 + * .vm_id = vm, 1216 + * .num_bb_per_exec = 1, 1217 + * .num_eng_per_bb = 1, 1218 + * .instances = to_user_pointer(&instance), 1219 + * }; 1220 + * ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, &exec_queue_create); 1221 + * 1231 1222 */ 1232 1223 struct drm_xe_exec_queue_create { 1233 1224 #define DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY 0 ··· 1261 1222 /** @vm_id: VM to use for this exec queue */ 1262 1223 __u32 vm_id; 1263 1224 1264 - /** @flags: MBZ */ 1225 + #define DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT (1 << 0) 1226 + /** @flags: flags to use for this exec queue */ 1265 1227 __u32 flags; 1266 1228 1267 1229 /** @exec_queue_id: Returned exec queue ID */ ··· 1536 1496 enum drm_xe_observation_type { 1537 1497 /** @DRM_XE_OBSERVATION_TYPE_OA: OA observation stream type */ 1538 1498 DRM_XE_OBSERVATION_TYPE_OA, 1499 + /** @DRM_XE_OBSERVATION_TYPE_EU_STALL: EU stall sampling observation stream type */ 1500 + DRM_XE_OBSERVATION_TYPE_EU_STALL, 1539 1501 }; 1540 1502 1541 1503 /** ··· 1889 1847 1890 1848 /* ID of the protected content session managed by Xe when PXP is active */ 1891 1849 #define DRM_XE_PXP_HWDRM_DEFAULT_SESSION 0xf 1850 + 1851 + /** 1852 + * enum drm_xe_eu_stall_property_id - EU stall sampling input property ids. 1853 + * 1854 + * These properties are passed to the driver at open as a chain of 1855 + * @drm_xe_ext_set_property structures with @property set to these 1856 + * properties' enums and @value set to the corresponding values of these 1857 + * properties. @drm_xe_user_extension base.name should be set to 1858 + * @DRM_XE_EU_STALL_EXTENSION_SET_PROPERTY. 1859 + * 1860 + * With the file descriptor obtained from open, user space must enable 1861 + * the EU stall stream fd with @DRM_XE_OBSERVATION_IOCTL_ENABLE before 1862 + * calling read(). EIO errno from read() indicates HW dropped data 1863 + * due to full buffer. 1864 + */ 1865 + enum drm_xe_eu_stall_property_id { 1866 + #define DRM_XE_EU_STALL_EXTENSION_SET_PROPERTY 0 1867 + /** 1868 + * @DRM_XE_EU_STALL_PROP_GT_ID: @gt_id of the GT on which 1869 + * EU stall data will be captured. 1870 + */ 1871 + DRM_XE_EU_STALL_PROP_GT_ID = 1, 1872 + 1873 + /** 1874 + * @DRM_XE_EU_STALL_PROP_SAMPLE_RATE: Sampling rate in 1875 + * GPU cycles from @sampling_rates in struct @drm_xe_query_eu_stall 1876 + */ 1877 + DRM_XE_EU_STALL_PROP_SAMPLE_RATE, 1878 + 1879 + /** 1880 + * @DRM_XE_EU_STALL_PROP_WAIT_NUM_REPORTS: Minimum number of 1881 + * EU stall data reports to be present in the kernel buffer 1882 + * before unblocking a blocked poll or read. 1883 + */ 1884 + DRM_XE_EU_STALL_PROP_WAIT_NUM_REPORTS, 1885 + }; 1886 + 1887 + /** 1888 + * struct drm_xe_query_eu_stall - Information about EU stall sampling. 1889 + * 1890 + * If a query is made with a struct @drm_xe_device_query where .query 1891 + * is equal to @DRM_XE_DEVICE_QUERY_EU_STALL, then the reply uses 1892 + * struct @drm_xe_query_eu_stall in .data. 1893 + */ 1894 + struct drm_xe_query_eu_stall { 1895 + /** @extensions: Pointer to the first extension struct, if any */ 1896 + __u64 extensions; 1897 + 1898 + /** @capabilities: EU stall capabilities bit-mask */ 1899 + __u64 capabilities; 1900 + #define DRM_XE_EU_STALL_CAPS_BASE (1 << 0) 1901 + 1902 + /** @record_size: size of each EU stall data record */ 1903 + __u64 record_size; 1904 + 1905 + /** @per_xecore_buf_size: internal per XeCore buffer size */ 1906 + __u64 per_xecore_buf_size; 1907 + 1908 + /** @reserved: Reserved */ 1909 + __u64 reserved[5]; 1910 + 1911 + /** @num_sampling_rates: Number of sampling rates in @sampling_rates array */ 1912 + __u64 num_sampling_rates; 1913 + 1914 + /** 1915 + * @sampling_rates: Flexible array of sampling rates 1916 + * sorted in the fastest to slowest order. 1917 + * Sampling rates are specified in GPU clock cycles. 1918 + */ 1919 + __u64 sampling_rates[]; 1920 + }; 1892 1921 1893 1922 #if defined(__cplusplus) 1894 1923 }

+9 -4

mm/memory.c

··· 4348 4348 * Get a page reference while we know the page can't be 4349 4349 * freed. 4350 4350 */ 4351 - get_page(vmf->page); 4352 - pte_unmap_unlock(vmf->pte, vmf->ptl); 4353 - ret = vmf->page->pgmap->ops->migrate_to_ram(vmf); 4354 - put_page(vmf->page); 4351 + if (trylock_page(vmf->page)) { 4352 + get_page(vmf->page); 4353 + pte_unmap_unlock(vmf->pte, vmf->ptl); 4354 + ret = vmf->page->pgmap->ops->migrate_to_ram(vmf); 4355 + unlock_page(vmf->page); 4356 + put_page(vmf->page); 4357 + } else { 4358 + pte_unmap_unlock(vmf->pte, vmf->ptl); 4359 + } 4355 4360 } else if (is_hwpoison_entry(entry)) { 4356 4361 ret = VM_FAULT_HWPOISON; 4357 4362 } else if (is_pte_marker_entry(entry)) {

+85 -35

mm/migrate_device.c

··· 60 60 struct mm_walk *walk) 61 61 { 62 62 struct migrate_vma *migrate = walk->private; 63 + struct folio *fault_folio = migrate->fault_page ? 64 + page_folio(migrate->fault_page) : NULL; 63 65 struct vm_area_struct *vma = walk->vma; 64 66 struct mm_struct *mm = vma->vm_mm; 65 67 unsigned long addr = start, unmapped = 0; ··· 90 88 91 89 folio_get(folio); 92 90 spin_unlock(ptl); 91 + /* FIXME: we don't expect THP for fault_folio */ 92 + if (WARN_ON_ONCE(fault_folio == folio)) 93 + return migrate_vma_collect_skip(start, end, 94 + walk); 93 95 if (unlikely(!folio_trylock(folio))) 94 96 return migrate_vma_collect_skip(start, end, 95 97 walk); 96 98 ret = split_folio(folio); 97 - folio_unlock(folio); 99 + if (fault_folio != folio) 100 + folio_unlock(folio); 98 101 folio_put(folio); 99 102 if (ret) 100 103 return migrate_vma_collect_skip(start, end, ··· 199 192 * optimisation to avoid walking the rmap later with 200 193 * try_to_migrate(). 201 194 */ 202 - if (folio_trylock(folio)) { 195 + if (fault_folio == folio || folio_trylock(folio)) { 203 196 bool anon_exclusive; 204 197 pte_t swp_pte; 205 198 ··· 211 204 212 205 if (folio_try_share_anon_rmap_pte(folio, page)) { 213 206 set_pte_at(mm, addr, ptep, pte); 214 - folio_unlock(folio); 207 + if (fault_folio != folio) 208 + folio_unlock(folio); 215 209 folio_put(folio); 216 210 mpfn = 0; 217 211 goto next; ··· 371 363 unsigned long npages, 372 364 struct page *fault_page) 373 365 { 366 + struct folio *fault_folio = fault_page ? 367 + page_folio(fault_page) : NULL; 374 368 unsigned long i, restore = 0; 375 369 bool allow_drain = true; 376 370 unsigned long unmapped = 0; ··· 437 427 remove_migration_ptes(folio, folio, 0); 438 428 439 429 src_pfns[i] = 0; 440 - folio_unlock(folio); 430 + if (fault_folio != folio) 431 + folio_unlock(folio); 441 432 folio_put(folio); 442 433 restore--; 443 434 } ··· 546 535 if (!args->src || !args->dst) 547 536 return -EINVAL; 548 537 if (args->fault_page && !is_device_private_page(args->fault_page)) 538 + return -EINVAL; 539 + if (args->fault_page && !PageLocked(args->fault_page)) 549 540 return -EINVAL; 550 541 551 542 memset(args->src, 0, sizeof(*args->src) * nr_pages); ··· 812 799 } 813 800 EXPORT_SYMBOL(migrate_vma_pages); 814 801 815 - /* 816 - * migrate_device_finalize() - complete page migration 817 - * @src_pfns: src_pfns returned from migrate_device_range() 818 - * @dst_pfns: array of pfns allocated by the driver to migrate memory to 819 - * @npages: number of pages in the range 820 - * 821 - * Completes migration of the page by removing special migration entries. 822 - * Drivers must ensure copying of page data is complete and visible to the CPU 823 - * before calling this. 824 - */ 825 - void migrate_device_finalize(unsigned long *src_pfns, 826 - unsigned long *dst_pfns, unsigned long npages) 802 + static void __migrate_device_finalize(unsigned long *src_pfns, 803 + unsigned long *dst_pfns, 804 + unsigned long npages, 805 + struct page *fault_page) 827 806 { 807 + struct folio *fault_folio = fault_page ? 808 + page_folio(fault_page) : NULL; 828 809 unsigned long i; 829 810 830 811 for (i = 0; i < npages; i++) { ··· 831 824 832 825 if (!page) { 833 826 if (dst) { 827 + WARN_ON_ONCE(fault_folio == dst); 834 828 folio_unlock(dst); 835 829 folio_put(dst); 836 830 } ··· 842 834 843 835 if (!(src_pfns[i] & MIGRATE_PFN_MIGRATE) || !dst) { 844 836 if (dst) { 837 + WARN_ON_ONCE(fault_folio == dst); 845 838 folio_unlock(dst); 846 839 folio_put(dst); 847 840 } ··· 852 843 if (!folio_is_zone_device(dst)) 853 844 folio_add_lru(dst); 854 845 remove_migration_ptes(src, dst, 0); 855 - folio_unlock(src); 846 + if (fault_folio != src) 847 + folio_unlock(src); 856 848 folio_put(src); 857 849 858 850 if (dst != src) { 851 + WARN_ON_ONCE(fault_folio == dst); 859 852 folio_unlock(dst); 860 853 folio_put(dst); 861 854 } 862 855 } 856 + } 857 + 858 + /* 859 + * migrate_device_finalize() - complete page migration 860 + * @src_pfns: src_pfns returned from migrate_device_range() 861 + * @dst_pfns: array of pfns allocated by the driver to migrate memory to 862 + * @npages: number of pages in the range 863 + * 864 + * Completes migration of the page by removing special migration entries. 865 + * Drivers must ensure copying of page data is complete and visible to the CPU 866 + * before calling this. 867 + */ 868 + void migrate_device_finalize(unsigned long *src_pfns, 869 + unsigned long *dst_pfns, unsigned long npages) 870 + { 871 + return __migrate_device_finalize(src_pfns, dst_pfns, npages, NULL); 863 872 } 864 873 EXPORT_SYMBOL(migrate_device_finalize); 865 874 ··· 894 867 */ 895 868 void migrate_vma_finalize(struct migrate_vma *migrate) 896 869 { 897 - migrate_device_finalize(migrate->src, migrate->dst, migrate->npages); 870 + __migrate_device_finalize(migrate->src, migrate->dst, migrate->npages, 871 + migrate->fault_page); 898 872 } 899 873 EXPORT_SYMBOL(migrate_vma_finalize); 874 + 875 + static unsigned long migrate_device_pfn_lock(unsigned long pfn) 876 + { 877 + struct folio *folio; 878 + 879 + folio = folio_get_nontail_page(pfn_to_page(pfn)); 880 + if (!folio) 881 + return 0; 882 + 883 + if (!folio_trylock(folio)) { 884 + folio_put(folio); 885 + return 0; 886 + } 887 + 888 + return migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE; 889 + } 900 890 901 891 /** 902 892 * migrate_device_range() - migrate device private pfns to normal memory. ··· 939 895 { 940 896 unsigned long i, pfn; 941 897 942 - for (pfn = start, i = 0; i < npages; pfn++, i++) { 943 - struct folio *folio; 944 - 945 - folio = folio_get_nontail_page(pfn_to_page(pfn)); 946 - if (!folio) { 947 - src_pfns[i] = 0; 948 - continue; 949 - } 950 - 951 - if (!folio_trylock(folio)) { 952 - src_pfns[i] = 0; 953 - folio_put(folio); 954 - continue; 955 - } 956 - 957 - src_pfns[i] = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE; 958 - } 898 + for (pfn = start, i = 0; i < npages; pfn++, i++) 899 + src_pfns[i] = migrate_device_pfn_lock(pfn); 959 900 960 901 migrate_device_unmap(src_pfns, npages, NULL); 961 902 962 903 return 0; 963 904 } 964 905 EXPORT_SYMBOL(migrate_device_range); 906 + 907 + /** 908 + * migrate_device_pfns() - migrate device private pfns to normal memory. 909 + * @src_pfns: pre-popluated array of source device private pfns to migrate. 910 + * @npages: number of pages to migrate. 911 + * 912 + * Similar to migrate_device_range() but supports non-contiguous pre-popluated 913 + * array of device pages to migrate. 914 + */ 915 + int migrate_device_pfns(unsigned long *src_pfns, unsigned long npages) 916 + { 917 + unsigned long i; 918 + 919 + for (i = 0; i < npages; i++) 920 + src_pfns[i] = migrate_device_pfn_lock(src_pfns[i]); 921 + 922 + migrate_device_unmap(src_pfns, npages, NULL); 923 + 924 + return 0; 925 + } 926 + EXPORT_SYMBOL(migrate_device_pfns); 965 927 966 928 /* 967 929 * Migrate a device coherent folio back to normal memory. The caller should have

Configure Feed

Configure Feed